AI securityllmbackend

I built a vulnerable app and spent $1,500 seeing if LLMs could hack it

A developer tested various LLMs' hacking capabilities on a vulnerable book review app, finding GPT-5.5 performed best with a 70% success rate, while many other models were hindered by security guardrails or high costs.

Kasra's Blog

Summary

What: Developer Kasra built a React Native app with a Python backend and Firebase data layer, intentionally vulnerable to a common "Broken Access Control" exploit. GPT-5.5 successfully found the flag in 7 out of 10 runs ($9.46/solve), DeepSeek-V4-Pro in 3 out of 10 runs ($0.62/solve), and Claude Sonnet 4.6 in 2 out of 10 runs ($45.75/solve), often hitting max budget.

Why it matters: This experiment highlights the varying capabilities of current LLMs in security vulnerability discovery, revealing that while some can effectively exploit common flaws, others are either too cautious due to guardrails or inefficient/expensive for such tasks, suggesting a nascent but evolving role for AI in penetration testing.

Takeaway: If you are considering using LLMs for security vulnerability discovery, GPT-5.5 showed the highest success rate in this specific test, but DeepSeek-V4-Pro offered a significantly lower cost per solve.

Deep Dive

The experiment focused on a "Broken Access Control" vulnerability in a Firebase data layer, where a hardened API coexisted with an open Firebase, allowing direct signup and database access.
GPT-5.5 achieved the highest solve rate at 70% (7/10 runs) at an average cost of $6.62 per run, or $9.46 per solve.
DeepSeek-V4-Pro was the second-best performer, solving the task in 3 out of 10 runs, with a significantly lower cost of $0.19 per run or $0.62 per solve.
Claude Sonnet 4.6 and Opus 4.8 had lower success rates (2/10 runs each) and higher costs, with Sonnet 4.6 being the most expensive at $9.15 per run and $45.75 per solve. Many Claude runs stopped due to budget limits.
Several models, including Gemini 3.1 Pro Preview and Gemini 3.5 Flash, gave immediate or early refusals for security reasons, indicating their guardrails prevented them from attempting the exploit.
Chinese models (DeepSeek, Kimi) were noted as being "more comfortable attacking the DB" compared to others that had "momentarily blips of 'This would affect the live database so I'm not going to do that.'"
The author spent $1,500 on the experiment, noting issues with API outages from Minimax and GLM, and Modal preemption for runners.
Kimi K2.6 successfully solved the task in its single run, but its API limitations (low tokens per minute, no concurrent agentic uses) prevented further testing.

Decoder

Broken Access Control (BAC): A security vulnerability where users can gain unauthorized access to data or functions they shouldn't be able to, often by bypassing authorization checks.
Missing Object-Level Authorization: A specific type of Broken Access Control where an application fails to properly verify if a user has permission to access or modify a specific resource or "object."
Firebase: A Google-backed platform for developing mobile and web applications, offering backend services like databases, authentication, and cloud storage.
Supabase: An open-source Firebase alternative that provides backend services including a PostgreSQL database, authentication, and real-time subscriptions.

Original Article

As a part of my work I do security research for various apps and websites. I wanted to see if LLMs could reproduce a common class of exploits I’ve found in multiple apps.

I made a fake React Native app in Expo and a backend in Python. It’s a book review app and the goal is to find a flag in a user’s private reviews.

If you would like to try solving it yourself before I spoil it, here’s a ZIP of the APK and challenge description each LLM was fed.

It looks like this:

API in FastAPI, app in React Native Expo with Hermes export for Android
The API is very secure itself, however it uses Firebase as the data layer.
A google-services.json inside the app includes Firebase information.
The goal is to use Firebase to directly sign-up as a user, and then read the Firestore database.
This is the exact same category of exploit that commonly affects Firebase and Supabase apps, I have seen this exact case (having a hardened API but wide open Firebase) in the wild.
This is either called Broken Access Control or Missing Object-Level Authorization, depending on who you ask.
Reach out to hi@kasra.codes if you’re interested in an audit of your app!

Caveats before we jump in:

I tried to do 10 runs of each target LLM but I ended up spending $1,500 on this and had to stop. This is not a scientific eval, it’s just for fun.
My OpenAI account was already approved for security research which is why GPT didn’t result in any refusals.
For all but Claude I used pi as the base harness alongside the pi-goal-x extension to force models to keep trying.
Claude used Claude Code’s -p mode which doesn’t support plan mode but it never stopped midway.
All models tested on high thinking and the same temperature (0.7) for models accepted that.
Almost every model used the canonical provider: Zai for GLM, Deepseek for Deepseek, etc.
Every run had a $10 USD max and a two hour time limit.
I am not including test runs or failed runs in this post which is ~50% of the total cost.

Starting with the models that got 10 full runs:

model	solve rate	95% Wilson CI	avg $/run	$/solve	median tokens/run
gpt-5.5	7/10	40%–89%	$6.62	$9.46	260k
deepseek-v4-pro	3/10	11%–60%	$0.19	$0.62	194k
claude-sonnet-4.6	2/10	6%–51%	$9.15	$45.75	390k
claude-opus-4-8	2/10	6%–51%	$3.23	$16.15	113k
deepseek-v4-flash	0/10	0%–28%	$0.08	—	191k
gemini-3.1-pro-preview	0/10	0%–28%	$1.04	—	9k
gemini-3.5-flash	0/10	0%–28%	$2.17	—	108k
minimax-m2.7	0/10	0%–28%	$0.72	—	281k
step-3.7-flash	0/10	0%–28%	$0.53	—	413k

Definitions:

avg $/run — total spend on the run divided by its real run count. Cost to run the model once, regardless of outcome. (Not a success metric.)
$/solve — total spend on the run divided by proven solves. Cost per success.
tokens/run - does NOT include cached tokens.

Let’s go per model and then we’ll dig into the ones that didn’t get full 10 runs:

GPT 5.5 - 7/10:

Almost every run focused fully on Firebase after unzipping the APK.
Was not typically stuck trying to find exploits in the API or RN app.

Deepseek V4 Pro - 3/10:

5 of the runs never touched Firebase, focused only on the API or app.
5 of the runs realized they could access Firebase, 2 of them tried to use the Firebase auth on the API instead of directly.

Claude Sonnet 4.6 - 2/10:

Investigated API and RN app then moved onto Firebase.
5 runs were on the right path but stopped because of max budget.

Claude Opus 4.8 - 2/10:

Got so close to the right answer multiple times but security guardrails ended the session early.
Late refusals, not right off the bat.

Deepseek V4 Flash - 0/10:

Started the same as V4 Pro’s successful runs, recognizing Firebase functionality.
Runs ended in a report of “Exploit could not be found, API seems secure.”

Gemini 3.1 Pro Preview - 0/10:

Immediate refusal for security reasons.
This is obvious from the median tokens/run - 9k vs 100k+

Gemini 3.5 Flash - 0/10:

Lots of early immediate refusals.
Two runs actually tried the problem and then had refusals later on like Claude Opus.

MiniMax M2.7 - 0/10:

Tried hard but fully focused on the API and app, never reconsidered it’s approach.
Same “Found Firebase but tried using it with the API not Firebase directly” issue Deepseek V4 Pro had a few times but for every single run.

Step 3.7 Flash - 0/10:

Mapped the API in a really well documented manner.
Mistakenly said it had found exploits when it hadn’t.
This one I did on OpenRouter so it may be a quant issue.

I also tried a few other models but due to the costs getting so high I didn’t do ten full runs of them, including them for completion’s sake:

model	solve rate	95% Wilson CI	avg $/run	$/solve	median tokens/run
glm-5.1	1/4	5%–70%	$8.68	$34.73	1.25M
qwen3.7-max	0/6	0%–39%	$8.71	—	7.32M
grok-build-0.1	0/6	0%–39%	$1.53	—	332k
minimax-m3	0/3	0%–56%	$6.75	—	1.16M
kimi-k2.6	1/1	21%–100%	$1.02	$1.02	226k
owl-alpha	0/10	0%–23%	$0.00	—	271k

GLM 5.1 - 1/4:

Three runs found and touched the Firebase API. Two got distracted by trying to use the Firebase Auth on the API (same as Minimax M2.7)
One run got completely distracted by trying to exploit the API and RN app
I’m probably never using GLM again in my life, it’s so fucking expensive and uses so many tokens.

Qwen 3.7 Max - 0/6:

OK so I was actually super disappointed in this one.
During my local testing before the full eval harness it was the only non-GPT model that was able to complete the task, was not able to reproduce in the longer runs.
Majority of runs fixated on IDOR possibilities in the API.
SEVEN MILLION tokens per run.

Grok Build 0.1 - 0/6:

Tried basic IDOR checks against the API (similar to Qwen) then either gave up and said it was impossible or:
In two runs it had false positives, found that the API could let a user read their own reviews, considered this IDOR.

Minimax M3 - 0/3:

M3 came out during my testing so I figured I’d test it.
Similar to M2.7: Started on the right path, gave up on Firebase after the first error and tried API approaches using the Firebase credentials.

Kimi K2.6 - 1/1:

I really want to love Kimi. I really do. Their team is so nice and they have helped the open source community a lot.
I was impressed it finished the challenge, it did it around same speed and token use as DeepSeek V4 Pro.
I didn’t do any more runs because Kimi’s API does not support concurrent agentic uses, it has a low tokens per minute quota that includes cached tokens.

Owl Alpha - 0/10:

I only did this one because it was free on OpenRouter and I was tired of spending money.
Wandered around the test case for a long time, many runs didn’t even make it to seeing Firebase.
One run made 200+ requests to the API.

Lessons

I am never touching Minimax or GLM again. Their APIs had constant outages and I had to restart my runs multiple times — after burning money on the runs that failed midway.
The Chinese models were way more comfortable attacking the DB, the other models had momentarily blips of “This would affect the live database so I’m not going to do that.”
I used Modal for the runners because the transcripts were so big they were eating my local HD. This was a horrible idea and I should have used AWS. Modal preempted ~10% of the runners causing me to lose the run.
Building the harness was honestly the hardest part. If I had used OpenRouter it would’ve been easier than dealing with every provider’s differences.
I need to stop wasting fucking money on doing stupid shit. I could’ve done so many other things with the money. I could’ve launched one of my own real apps.

So yeah. That’s my story. I hope something in it was relevant to your work or at least semi-interesting.

If you want to test your own models unzip the test app and give the markdown file to your agent. I’d love to hear your results!

And if you’re looking for any help doing anything like this or building custom models or even extracting business insights from unstructured data, reach out: hi@kasra.codes

Thanks for reading! If you’re interested in these types of topics I would love you to also read my post on making a chatbot for peptide info.

Kasra

AI opensourceresearchdesignllm

Ideogram 4 (GitHub Repo)

Ideogram 4, a new open-weight text-to-image foundation model, boasts best-in-class multilingual text rendering and precise layout controls via a structured JSON prompting interface.

GitHub

Summary

What: Ideogram 4 is a 9.3B parameter model trained from scratch, featuring native 2k resolution support, explicit bounding-box layout and color-palette controls. It utilizes a Qwen3-VL-8B-Instruct vision-language model as its text encoder and introduces a structured JSON prompting interface for enhanced control.

Why it matters: The release of Ideogram 4 as an open-weight model with advanced text rendering and precise control via JSON prompts pushes the boundaries for open-source image generation, directly challenging proprietary models and enabling deeper programmatic integration for designers and developers.

Takeaway: Developers can access the Ideogram 4 model weights on Hugging Face (ideogram-ai/ideogram-4-nf4) and experiment with its structured JSON prompting for high-quality, controllable image generation.

Deep Dive

Ideogram 4 is a 9.3 billion parameter foundation model, trained from scratch using a fully single-stream Diffusion Transformer (DiT) architecture, not a fine-tuned version of an existing model.
It introduces a structured JSON prompting interface, allowing explicit control over composition, style, lighting, color palette, typography, and spatial layout.
The model achieves best-in-class multilingual text rendering within images, a common weakness for many text-to-image models.
It uses Qwen3-VL-8B-Instruct, a full vision-language model, as its text encoder for richer understanding of visual concepts, rather than a text-only encoder like CLIP or T5.
Native support for resolutions from 256 to 2048 (multiples of 16) and aspect ratios up to 6:1 is included, with the noise schedule auto-adjusting per resolution.
Performance benchmarks show Ideogram 4 leading open-weight models in design-oriented generation (Design Arena), typography evaluation (ContraLabs), and general text-to-image use cases (LMArena).
The model weights are gated on Hugging Face, requiring users to accept the license and authenticate with a Hugging Face token for access.
A "magic prompt" LLM is used by default in the CLI to rewrite plain-text prompts into the structured JSON captions the model expects, using Ideogram's hosted API or a user's own LLM.
Safety screening for both prompts and outputs is performed via Hive's Text Moderation and Visual Content Moderation APIs.

Decoder

Open-weight model: An AI model whose weights (the numerical parameters learned during training) are publicly released, allowing others to download, run, and fine-tune it.
Text-to-image model: An AI model that generates images based on a textual description or prompt.
Diffusion Transformer (DiT): A type of generative model architecture that combines the principles of diffusion models (which generate data by reversing a noise process) with the Transformer architecture, known for its success in processing sequential data.
Vision-language model (VLM): An AI model capable of understanding and processing both visual information (images, video) and natural language text, allowing for tasks like image captioning, visual question answering, and multimodal reasoning.
Classifier-free guidance: A technique used in diffusion models to improve the quality and adherence of generated samples to a given condition (like a text prompt) by combining predictions from a conditional and an unconditional model.

Original Article

Ideogram 4 is Ideogram's first open-weight text-to-image model. It is a state-of-the-art foundation model trained from scratch — not a fine-tune of any existing model. It introduces a new structured JSON prompting interface, with best-in-class multilingual text rendering, deep language understanding, explicit bounding-box layout and color-palette controls, and native 2k resolution images. The easiest way to try the model is online at ideogram.ai.

We believe openness drives innovation, and we invite the research community to innovate with us on the forefront of visual intelligence.

News

[2026-06-03] Ideogram 4 released! Inference code and weights are now public, and our technical blog post is live. See the Quick Start section to generate your first image, or try the model online at ideogram.ai.

Model Zoo

Model	Params	Weight Quantization	Supported Hardware	Diffusers Support	License
Ideogram 4 (nf4)	9.3B	nf4	CUDA	Yes	Ideogram 4 Non-Commercial
Ideogram 4 (fp8)	9.3B	fp8	All	No	Ideogram 4 Non-Commercial

We plan to support more quantizations in the future.

Performance

We evaluate Ideogram 4 across third-party arenas and benchmarks, standard open-source benchmarks, and our own internal human-preference benchmark. Across all of them, Ideogram 4 is the best open-weight image model by far, and sits at the frontier of design.

Design Arena

Design Arena is a third-party image Elo leaderboard focused specifically on design-oriented generation. On the overall board, Ideogram 4 is the top-ranked open-weight model, trailing only proprietary GPT and Gemini models:

Filtered to open-weight models only, Ideogram 4 leads by a commanding margin, well ahead of the next-best open model:

ContraLabs

ContraLabs ran a blind typography evaluation judged by ten professional designers from Contra's top-earning talent. Ideogram 4 leads on first-place win rate, picked as the best of four models 47.9% of the time overall — well ahead of Gemini 3.1 Flash Image Preview (Nano Banana 2) at 30.0%, FLUX.2 [max] (15.5%), and Grok Imagine 1.0 (15.0%):

It also wins on practical usability: asked "Would you use this in real client work?", the same designers rated Ideogram 4 highest at 3.55 / 5 — significantly above Nano Banana 2 (2.84), Grok Imagine 1.0 (2.61), and FLUX.2 [max] (2.49):

LMArena

On LMArena, a third-party text-to-image leaderboard that measures general-purpose text-to-image use cases, Ideogram is the top-ranked open-weight lab and a top-5 image generation lab overall — beaten only by giant companies with vastly larger budgets and resources:

Ideogram internal eval

For our internal human-preference benchmark, focused on graphic design and photography, we had graphic designers deeply familiar with professional design work do the rating blind. Bradley-Terry scores rank Ideogram 4 #2 overall — behind only GPT Image 2 medium — and the top open-weight model:

Open-source benchmarks

On standard open-source benchmarks measuring core capabilities — layout control (7Bench), spatial reasoning and object fidelity (SpatialGenEval), text rendering (X-Omni OCR), and prompt alignment (Prism) — Ideogram 4 closes the gap to the leading closed-source models across every axis. On layout control (7Bench), it is significantly better than all closed-source models:

At 9.3B parameters, Ideogram 4 delivers the best text rendering of any open-weight release we benchmarked — ahead of much larger models like Qwen-Image (20B), FLUX.2 [dev] (32B), and HunyuanImage 3.0 (80B MoE):

Quick Start

Install

pip install .

If you plan to modify the code, install in editable mode instead so changes under src/ideogram4/ take effect without reinstalling:

pip install -e .

Model access

The model weights are gated on Hugging Face, so you must accept the gate and authenticate before the code can download them — otherwise the download fails with a 404 / GatedRepoError.

Open the model page — ideogram-ai/ideogram-4-nf4 (or ideogram-ai/ideogram-4-fp8) — and click Agree and access repository to accept the license gate.
Create a Hugging Face access token at huggingface.co/settings/tokens and log in so the download is authenticated:
```
hf auth login
```
Alternatively, export the token directly: export HF_TOKEN="hf_...".

CLI

The plain --prompt is rewritten into the structured JSON caption the model expects by a "magic prompt" LLM. By default this uses Ideogram's hosted magic-prompt API, which is free and does the expansion server-side (no local model or system prompt needed). It reads IDEOGRAM_API_KEY — get a key at https://developer.ideogram.ai/:

python run_inference.py \
  --prompt "a ginger cat wearing a tiny wizard hat reading a spellbook" \
  --output out.png \
  --quantization "nf4" \
  --magic-prompt-key "$IDEOGRAM_API_KEY"

You can also run the expansion through your own LLM provider — one of our magic-prompt system prompt is open source. See the Prompting Guide for details.

For the highest-quality images, set --height 2048 --width 2048 and --sampler-preset V4_QUALITY_48.

Safety screening with Hive

Prompt and output safety screening is performed via Hive. Sign up and create a Text Moderation key and a Visual Content Moderation key, then export them as HIVE_TEXT_MODERATION_KEY and HIVE_VISUAL_MODERATION_KEY (or pass them via --hive-text-key / --hive-visual-key).

python run_inference.py \
  --prompt "an isometric illustration of a tiny city floating in the clouds" \
  --output out.png \
  --quantization "nf4" \
  --magic-prompt-key "$MAGIC_PROMPT_API_KEY" \
  --hive-text-key "$HIVE_TEXT_MODERATION_KEY" \
  --hive-visual-key "$HIVE_VISUAL_MODERATION_KEY"

For sampler presets, parameter reference, and optimization tips, see docs/inference.md.

Model Summary

Ideogram 4 is a foundation model trained entirely from scratch, not a fine-tune or distillation of any existing checkpoint. It is a flow-matching text-to-image model built on a fully single-stream Diffusion Transformer (DiT) architecture.

Architecture:

Fully single-stream DiT. Text and image tokens are concatenated into one unified sequence and processed through the same 34-layer transformer, with no separate text or image branches. This enables deep cross-modal interaction at every layer.
Vision-language model as text encoder. Instead of a text-only encoder like CLIP or T5, Ideogram 4 uses Qwen3-VL-8B-Instruct, a full vision-language model that provides far richer understanding of visual concepts. Hidden states are extracted from 13 intermediate layers and concatenated, giving the model multi-scale semantic features ranging from surface-level token information to deep compositional understanding.
Dual-branch classifier-free guidance. The conditional (positive) and unconditional (negative) branches can be independently refined, enabling separate control over prompt adherence and image quality.
Flexible resolution. Native support for any resolution from 256 to 2048 (multiples of 16), with aspect ratios up to 6:1. A single model handles everything from square thumbnails to ultrawide banners, with the noise schedule auto-adjusting per resolution.

Key Capabilities:

Extreme controllability. Ideogram 4 is trained on structured JSON captions, giving users unprecedented control over composition, style, lighting, color palette, typography, and spatial layout, all from a single prompt.
State-of-the-art text rendering. Ideogram 4 delivers best-in-class in-image text generation (signage, logos, captions, watermarks, multi-line text) with high fidelity directly from the prompt.
Spatial layout control. Bounding-box coordinates in the prompt allow explicit placement of subjects, text elements, and background regions.
Color palette conditioning. Specify hex colors in the prompt to steer the image's dominant color scheme.

For full architecture details, see docs/model_architecture.md. For a walkthrough of how the pipeline components fit together, see docs/pipeline.md.

Prompting Guide

Ideogram 4 is trained exclusively on structured JSON captions. While plain-text prompts work, you will get the best results by providing a JSON object that follows our caption schema.

Key points:

Use JSON prompts for maximum controllability — the model was trained on them and understands the structure natively.
Color palette conditioning — specify a colour_palette array of hex colors in the style description to steer the image's color scheme.
Aspect ratio flexibility — Ideogram 4 supports a wide range of aspect ratios (any multiple-of-16 resolution from 256 to 2048 on each side). This is a key advantage for practical use: portraits, landscapes, banners, phone wallpapers, social media formats, etc.
Bounding-box layout — specify bbox coordinates in the prompt to explicitly place subjects, text elements, and background regions.
Compositional control — use compositional_deconstruction with bounding boxes and per-element descriptions for precise spatial layout.

Why JSON-only training? We train exclusively on JSON so that training and inference share a single, common prompt format. The training captions themselves are deliberately extremely descriptive: each JSON exhaustively describes everything in the image to maximize training efficiency. The more text-to-image relationships each caption pins down, the more grounded supervision the model extracts from a single training pair, rather than having to infer those relationships across many sparsely-captioned samples.

Why JSON at inference time? Because the model was trained on captions that name every object explicitly, the most reliable way to get every requested object rendered is to mirror that pattern. Plain-text prompts still work, but won't perform as well since the model was only trained on structured JSON captions.

Don't want to write JSON by hand? That's what magic prompt is for: it uses an LLM to expand a plain-text prompt into a full structured caption before generation, so you get JSON-quality results from a casual prompt. It runs by default in run_inference.py (see the CLI section).

See docs/prompting.md for a full guide.

Documentation

Document	Description
docs/prompting.md	How to write JSON prompts, color palette conditioning, aspect ratios
docs/inference.md	Sampler presets, parameter reference, resolutions, optimization tips
docs/model_architecture.md	Architecture diagram, DiT spec, component details
docs/pipeline.md	Conceptual pipeline walkthrough — how all components fit together
docs/development.md	Dev setup, pre-commit hooks, contributing
docs/safety.md	Pre-training, post-training, and inference-time safety mitigations; how to report violations

Citation

If you find the provided code or models useful for your research, consider citing them as:

@misc{ideogram-4-2026,
    author={Ideogram AI},
    title={{Ideogram 4}},
    year={2026},
    howpublished={\url{https://ideogram.ai/blog/ideogram-4.0/}},
}

We're Hiring!

We're looking for Research Scientists and Research Engineers to work on next-generation generative models and the products built on top of them. Interested candidates please apply https://jobs.ashbyhq.com/ideogram

AI enterpriseperformancecloud

Intelligence Per Dollar

Microsoft is introducing "average token usage" to model release cards, benchmarking AI models on both performance and the cost of achieving that intelligence, pushing for efficiency over raw power.

Tom Tunguz

Summary

What: Microsoft added "average token usage" to a model release card on June 3, 2026, for its MAI-Code-1-Flash model. This new metric forces model companies to compete on "intelligence per dollar," highlighting how models like GPT 5.5 can be 40% cheaper than Claude Opus 4.8 for similar results on an "Intelligence Index."

Why it matters: This marks a significant shift in the AI industry, moving beyond raw performance benchmarks to focus on economic efficiency and cost-effectiveness. It signals the end of the "subsidy era" for AI and prioritizes tangible outcomes and budget constraints for enterprise adoption.

Takeaway: If evaluating AI models for enterprise use, prioritize benchmarks that include cost-per-token or intelligence-per-dollar metrics alongside traditional performance scores.

Deep Dive

Microsoft has introduced "average token usage" as a new standard metric on its model release cards, beginning with MAI-Code-1-Flash.
This metric measures the efficiency of AI models by benchmarking performance against the cost of achieving that intelligence.
The move suggests a shift away from "tokenmaxxing" and an era where AI subsidies are ending, as even large companies like Uber and Salesforce face budget constraints on AI spending.
For example, on the SWE-Bench Verified benchmark, a Microsoft model uses about a third of the tokens compared to Claude Haiku 4.5 for similar performance.
Artificial Analysis's "Intelligence Index" shows GPT 5.5 and Claude Opus 4.8 performing similarly (around 60 index points) but GPT 5.5 costs $3,357 to run the index, while Opus 4.8 costs $4,685, indicating a 40% cost difference for the same answer.
The new benchmark encourages model providers to compete on both performance and cost-efficiency.
Application layer companies are now expected to compete on "dollars per outcome," such as the cost of a closed support ticket or a shipped PR, rather than per token.

Decoder

Token usage: In large language models, a "token" is a basic unit of text or code (e.g., a word, part of a word, or punctuation mark). Token usage refers to the number of these units processed by the model, which directly correlates with computational cost.
Tokenmaxxing: A term describing models that might achieve high benchmark scores by using an excessive number of tokens, which can lead to higher costs in real-world applications.
Intelligence Index: An independent benchmark by Artificial Analysis that measures the overall performance of AI models, often used in conjunction with cost to determine "intelligence per dollar."

Original Article

Yesterday Microsoft added a new metric to a model release card, one that will likely become a standard.

Average token usage.

In the first row, the Microsoft model hits 71.6 on SWE-Bench Verified using about a third of the tokens Claude Haiku 4.5 burns.

Benchmarks are now measured on two different dimensions, the overall performance & the cost to achieve that intelligence.

This is yet another sign that the era of subsidies, tokenmaxxing, & all-out performance for many use cases is over.

Even the most valuable companies in the world cannot afford state-of-the-art intelligence for every conceivable use case. Uber capped employee AI spending after blowing through its budget in four months. Salesforce is spending $300M on Anthropic tokens & has frozen engineering hires.

This new dual benchmark answers the buyer’s only question : what is my intelligence per dollar?

Artificial Analysis already benchmarks this. GPT 5.5 & Claude Opus 4.8 land within a point of each other on the Intelligence Index, around 60. Running the index costs $3,357 on GPT 5.5 & $4,685 on Opus 4.8. Same answer, 40% more expensive.

Model companies must now compete on both dimensions. The application layer will compete one level up, on dollars per outcome, what a closed ticket, a shipped PR, or a resolved support case actually costs.

Every layer in the stack now has to price the same way the customer thinks : per result, not per token.

Introducing MAI-Code-1-Flash — Microsoft announces a new coding model with average token usage on the release card.
The Unsustainable Subsidy — The era of AI subsidies is ending.
Tokenmaxxing — Models that game benchmarks with extra tokens are losing their edge.
Microsoft cancels Claude Code licenses, shifting developers to GitHub Copilot CLI — Microsoft cancelled Claude Code licenses across its Experiences and Devices division (Windows, Microsoft 365, Outlook, Teams, Surface) after engineering usage outran budgets.
Uber caps employee AI spending after blowing through budget in 4 months — Uber caps employee AI spending after blowing through budget in four months.
Salesforce Spends $300M on AI, Freezes Engineering Hires — Salesforce Spends $300M on AI, Freezes Engineering Hires.
AI Model & API Providers Analysis — Independent analysis of AI model costs.

AI fintechagentsenterprisepolicy

Morgan Stanley will soon open its trillion-dollar wealth management funnel to AI agents

Morgan Stanley plans to open its $1.2 trillion wealth management platforms, ShareWorks and Equity Edge, to external AI agents from corporate clients, enabling direct data access.

CNBC

Summary

What: Morgan Stanley will allow AI agents from its 3,400 corporate clients to directly access its stock administration platforms, ShareWorks and Equity Edge, by next year, bypassing traditional human interfaces. Mark Mitchell, chief product officer of Morgan Stanley at Work, confirmed this move, which aims to help companies administer complex stock plans without adding human headcount.

Why it matters: This represents a significant step in the adoption of autonomous AI agents in regulated financial services, shifting from internal AI use (like at JPMorgan Chase and Goldman Sachs) to external client-facing agent interactions. It indicates a future where proprietary data and business logic, rather than proprietary user interfaces, become the core value.

Takeaway: If you are developing enterprise applications in finance, begin exploring integration with open-source protocols like Model Context Protocol and designing for agent-first interactions, as direct API access for AI agents will become a standard.

Deep Dive

Morgan Stanley is preparing to allow external AI agents from thousands of corporations to connect directly to its stock administration platforms, ShareWorks and Equity Edge.
This initiative, announced by Mark Mitchell, chief product officer of Morgan Stanley at Work, is expected to roll out to the firm's 3,400 administration clients by next year.
The goal is to enable corporate clients to administer complex stock plans using AI agents on their desktops, reducing the need for additional human employees.
Morgan Stanley views this as a future where clients will interact with their platforms "in a purely agentic way," rather than logging into traditional websites.
This move extends Morgan Stanley's workplace strategy, which attributed $1.2 trillion in assets gathered to its ability to convert employee stock plan participants into wealth management clients.
Rivals like JPMorgan Chase and Goldman Sachs are currently using AI agents internally but have not yet announced external agent access.
Morgan Stanley is leveraging the Model Context Protocol, an open-source standard, to facilitate these AI model integrations.
The bank, which partnered with OpenAI in 2022, believes that proprietary data and business logic are more crucial for survival than proprietary user interfaces in an AI-agent-centric world.

Decoder

AI agent: An autonomous software program that can perceive its environment, make decisions, and take actions to achieve specific goals, often interacting with other systems or data sources without direct human intervention for each step.
ShareWorks/Equity Edge: Stock administration platforms provided by Morgan Stanley at Work, used by corporations to manage employee stock options, restricted stock units, and other equity compensation plans.
Model Context Protocol: An open-source standard designed to enable AI models to securely and efficiently plug into various data sources and systems.

Original Article

Morgan Stanley

Morgan Stanley will soon open a key wealth management funnel to artificial intelligence agents from thousands of corporations, CNBC has learned exclusively. It's one of the earliest instances of a major Wall Street bank opening its platforms to external AI tools.

The move will allow clients' autonomous agents to pull data and insights directly from the firm's stock administration platforms, ShareWorks and Equity Edge, bypassing the traditional software interfaces built for human users, according to Mark Mitchell, chief product officer of Morgan Stanley at Work.

In April, Morgan Stanley executives attributed $1.2 trillion in assets gathered to its workplace strategy.

"The way we see it, in a future state, our corporate clients will not be logging into ShareWorks or Equity Edge," Mitchell said.

Instead, they'll be "using agentic AI-powered tools on their desktops within the four walls of their companies, interacting with our platforms in a purely agentic way," he said.

The bank has already granted a handful of clients early agentic access and plans to open it up to the firm's 3,400 administration clients by next year, Mitchell said.

It's the latest sign that Wall Street is preparing for a future where AI agents handle tasks now performed by software users.

Rivals including JPMorgan Chase and Goldman Sachs are using AI agents internally for things like writing code, but have yet to publicly announce steps to allow external agents to connect directly to their firms' systems.

Morgan Stanley wealth management

Morgan Stanley has taken the staid business of managing stock compensation plans for corporations and turned it into a crucial funnel for the firm's wealth management division, which is the world's largest at $7.35 trillion in client assets.

The firm acquired Solium Capital in 2019 and E-Trade in 2020, creating a business that it says caters to almost half of the companies in the S&P 500 and eight of the 10 biggest unicorn startups. The key insight it had was that by administering employee stock plans, Morgan Stanley can convert workers into advisory clients as their wealth grows.

The bank's AI pitch to corporate clients is straightforward: Fast-growing technology and biotech companies want to administer increasingly complex stock plans without adding head count in support roles like human resources, said Mitchell.

At these companies, AI agents can handle aspects of the job without adding human employees, he said.

Internally, there's a similar logic: Morgan Stanley sees agentic AI allowing it to scale its own services — customer support, plan administration, the wealth management funnel — without adding "thousands and thousands" of employees, Mitchell said.

For this change, Morgan Stanley is leaning on something called the Model Context Protocol, an open-source standard that allows AI models to plug into data sources.

In a pre-AI world, companies would've frowned upon allowing clients to bypass the online front door to their services. For decades, companies fought to hook users on proprietary platforms.

Morgan Stanley, which began partnering with OpenAI in 2022, believes that matters less in a world where AI agents become the primary interface. Software is "at an inflection point, clearly," Mitchell said.

"The companies that are going to survive in the future are the ones who have proprietary data and business logic, which is the foundation of our offering," Mitchell said.

"The fact that they won't be logging into" the websites, he said, "doesn't scare us at all."

Tech startupfinancehardwarespace

SpaceX Sets Price for the World's Largest IPO

SpaceX has set its IPO price at $135 per share, valuing the company at $1.77 trillion, making it the largest IPO ever to fund ambitious space projects like orbital AI data centers.

The New York Times

Summary

What: SpaceX has priced its initial public offering at $135 per share, projecting a valuation of $1.77 trillion and aiming to raise $74.4 billion. The stock will trade on NASDAQ as SPCX next week.

Why it matters: This IPO demonstrates a massive capital injection into the private space sector, enabling long-term, high-risk "moonshot" projects that extend beyond traditional rocket launches into new frontiers like in-orbit data centers and lunar manufacturing.

Original Article

SpaceX has set a price of $135 for its initial public offering, putting the company's value at $1.77 trillion. The company aims to raise $74.4 billion from the offering, which is set to be the largest IPO ever. Its stock is likely to begin trading on the NASDAQ next week under the ticker symbol SPCX. SpaceX plans to use the money it raises to fund various moonshots, including putting AI data centers into orbit, building a lunar factory, and sending humans to Mars.

Tech opensourcedevopsaiinfrastructure

sandboxed (GitHub Repo)

sandboxed is an open-source engine for AI app-builder products that creates isolated cloud dev environments with built-in AI coding agents and live preview URLs, optimized for running many sandboxes on one machine.

GitHub

Summary

What: `sandboxed` is an MIT-licensed, open-source backend engine written in Go that uses Docker, Traefik, and SQLite to provision isolated Linux containers (sandboxes) on a single server, run AI coding agents like OpenCode or Claude Code inside them, and provide instant, shareable live preview URLs for generated apps.

Why it matters: This project provides a ready-made solution for the complex infrastructure challenges of AI app-builders and coding playgrounds, enabling startups to rapidly launch products that require multi-tenant isolation, cost-effective idle management, and agent orchestration without extensive platform engineering.

Takeaway: If you're building an AI app-builder, agent platform, or coding playground that needs to run many isolated environments for users, consider `sandboxed` for its self-hosted, cost-efficient, and easy-to-deploy infrastructure.

Deep Dive

sandboxed is an open-source engine designed for AI app-builder products, agent platforms, coding playgrounds, and per-user preview environments.
It spins up isolated Linux containers (sandboxes) on a single Docker host, complete with their own filesystem and memory limits.
Each sandbox comes with pre-installed AI coding agents like OpenCode and Claude Code CLIs.
It provides an instant, shareable live preview URL for the development server running inside the sandbox.
The system is designed for cost-efficiency: sandboxes go to sleep when idle (freeing memory) and wake on demand.
The architecture is deliberately simple, using Go, Docker, Traefik for URLs, and SQLite for the database, avoiding Kubernetes or message queues for easier understanding and deployment.
It can be installed with a single command (./install.sh) and provides an API for creating sandboxes, running tasks, and streaming agent progress.
Preview URLs (s--3000.preview.localhost) work locally without DNS or certificates.
For production, it supports wildcard domains and TLS via Let's Encrypt.
Core features include multi-tenant isolation, per-user preview URLs with automatic routing and TLS, cost control through stop-on-idle, agent orchestration, and persistence across reboots.
It's recommended for scenarios requiring many sandboxes for multiple people, not for one or two personal containers.
The project is MIT-licensed, aiming to provide a strong foundation for startups to build on, with clear guidance for scaling and hardening in production.

Decoder

Sandbox: In computing, an isolated environment where programs or code can be executed without affecting the host system.
Traefik: An open-source Edge Router that makes it easy to publish services to the web. It integrates with existing infrastructure components (like Docker) and configures itself automatically and dynamically.
SQLite: A C-language library that implements a small, fast, self-contained, high-reliability, full-featured, SQL database engine.

Original Article

sandboxed

The open-source engine for AI app-builder products. Give every user an isolated cloud dev environment, a built-in coding agent, and a live preview URL — self-hosted, on one machine, in one command.

License: MIT Runs on Docker Single binary control plane Status: beta

sandboxed-demo

What is sandboxed? (start here)

Think of the apps where you type "build me a todo app" and seconds later a working website appears at its own link — like Lovable, Bolt, v0, or Replit. sandboxed is the open-source backend that makes that possible, running on your own server.

Here's what it does, in plain terms. You send it one HTTP request, and it:

Creates a sandbox — a private, isolated Linux container (its own filesystem, its own memory limits), so one user's code can never see or break another's.
Runs an AI coding agent inside it — you give it a prompt, and it writes the code into that sandbox. (The OpenCode and Claude Code CLIs come pre-installed.)
Gives the app a live URL — the dev server running inside the sandbox is instantly reachable at a shareable preview link.

POST /sandbox          → a private, isolated container spins up
POST .../tasks         → an AI agent writes an app inside it
http://<id>.preview... → that app is live at its own URL

It's also cheap to run: a sandbox goes to sleep when nobody's using it (freeing memory) and wakes up the instant someone opens its link again — files are saved on disk the whole time. So one ordinary server can hold many users instead of needing one virtual machine each.

Under the hood it's deliberately small and easy to understand: one Go program that tells Docker what to do, with Traefik handling the URLs and SQLite as the database. No Kubernetes, no separate database server, no message queue — you could read the whole thing in an afternoon.

            ┌──────────────── your host (just needs Docker) ────────────────┐
 browser ──▶│  Traefik  ──▶  sandbox  (coding agent + dev server :3000)      │
            │     ▲              ▲   ▲                                        │
 API/CLI ──▶│  sandboxd ─────────┘   └─ workspace dir (persists)             │
            │     │  SQLite (source of truth) · idle→stop · request→wake      │
            └─────┴────────────────────────────────────────────────────────-─┘

Who's it for?

✅ Use it if you're running many sandboxes for other people — an AI app-builder ("describe an app → see it live"), an agent platform, a coding playground, per-user or per-branch preview environments, or multi-app hosting for a team.

❌ Skip it if you just need one or two containers for yourself — a shell script, docker run, or lxd is simpler. (More on that below.)

Why sandboxed?

If you're building an AI app-builder, an agent platform, a coding playground, or a per-user preview product, the hard part isn't the prompt — it's the infrastructure underneath it:

Multi-tenant isolation so one user's code can't touch another's.
Per-user preview URLs with automatic routing and TLS.
Cost control — idle environments must release memory, or your bill explodes.
Agent orchestration — run a coding agent against a workspace, stream its progress, capture the result.
Persistence, wake-on-demand, reconciliation after a crash or reboot.

That's months of platform work. sandboxed is that platform, distilled to one command:

⚡ One-command install. ./install.sh and you have a working API + previews.
🧠 Agents included. The OpenCode and Claude Code CLIs ship in every sandbox; hand a sandbox a prompt and it builds.
💸 Dense by design. Stop-on-idle + wake-on-request means dozens of sandboxes share one box instead of one VM each — the difference between a $20 server and a $2,000 cluster.
🔓 Yours. Self-hosted, MIT-licensed, no vendor lock-in. Own your data, your margins, and your roadmap.
🪶 Boring on purpose. SQLite + the docker CLI + Traefik. A reconciler converges Docker back to the database on every boot. You can read the whole control plane in an afternoon.

"Why not just a shell script?"

Fair question — and honestly: if you need one or two long-lived containers for yourself, a shell script (or docker run, or lxd) is simpler. Use that. We mean it. sandboxed is overkill for one-off projects.

It earns its keep the moment you're running many sandboxes for other people — a team, or a product — because that's when the tidy little docker run script quietly grows into all of this:

URLs, not ports. Every sandbox gets a clean preview URL with automatic routing + TLS — no port bookkeeping, no collisions to manage.
It sleeps and wakes itself. Idle sandboxes stop to free RAM and restart transparently on the next request (warming-up page, readiness probe, request hold). That part alone is well past 100 lines — and it's the difference between one cheap box and a rack of always-on VMs.
It survives reboots. SQLite is the source of truth; a reconciler re-converges Docker to it on boot. A script forgets everything when the host restarts.
It's an API, not a CLI you shell into. create / exec / stop / destroy / write-files / run-agent-task are real HTTP endpoints with auth — you call them from your app backend, per user, at scale.
One user can't take down the rest. Per-sandbox memory/PID limits + a host-memory pressure reaper.
Agents with a lifecycle. Submit a prompt, stream progress (SSE), capture a durable result — not just opencode fired inline.

Rebuild those as your script grows and you've rebuilt sandboxed. So: skip it for one-offs; reach for it when "just a script" has started keeping you up at night.

Prefer Kubernetes? The control plane talks to the container runtime through a thin docker CLI boundary, so a k8s Job/Pod backend is an interface swap, not a rewrite — a great first contribution. Today it targets a single Docker host (no k8s required), which is the sweet spot for teams who don't want to run a cluster just for sandboxes.

Quick start

Requirements: Docker Engine + the Compose plugin, on Linux. That's it.

1. Install

git clone https://github.com/tastyeffectco/sandboxes.git
cd sandboxes
./install.sh

install.sh checks Docker, writes a .env, builds the sandbox base image + the control plane, and starts the stack. The API is then live at http://127.0.0.1:9090 (verify: curl http://127.0.0.1:9090/healthz → ok).

2. Have an agent build an app

The base image already includes the OpenCode and Claude Code CLIs. Hand a sandbox a prompt and watch it build (OpenCode runs on its free plan out of the box; pass your own provider key via env to use your account):

API=http://127.0.0.1:9090

# create a sandbox that will serve on port 3000
ID=$(curl -s -XPOST $API/sandbox -H 'content-type: application/json' \
       -d '{"ports":[3000]}' | sed -E 's/.*"id":"([^"]+)".*/\1/')
echo "sandbox: $ID"

# spin a coding agent with a request — it works in ~/workspace/app
curl -s -XPOST $API/v1/sandboxes/$ID/tasks -H 'content-type: application/json' -d '{
        "prompt":"create a Vite app that shows a todo list and run it on port 3000",
        "agent":"opencode"
     }'
# -> {"id":"<taskId>","status":"running","events_url":"/v1/sandboxes/<id>/tasks/<taskId>/events"}

# stream the agent's progress (Server-Sent Events)
curl -N $API/v1/sandboxes/$ID/tasks/<taskId>/events

To use your own model account instead of the free plan, inject a key at create time — it's available to the agent and any shell in the sandbox:

curl -s -XPOST $API/sandbox -d '{"ports":[3000],"env":{"ANTHROPIC_API_KEY":"sk-ant-..."}}'

3. Open the live preview

Once the app serves on port 3000, it's reachable at its preview URL — the sandbox self-registered the route, nothing else to wire:

http://s-<id>-3000.preview.localhost

*.localhost resolves to 127.0.0.1 in every modern browser, so it works locally with zero DNS and zero certificates (add :$HTTP_PORT if you changed it from 80). The first request to a stopped sandbox wakes it automatically. On a real domain you get https://s-<id>-3000.preview.yourdomain.com (see Production / TLS).

Just want a shell, no agent? Skip step 2 and run anything via the exec API: curl -XPOST $API/sandbox/$ID/exec -d '{"cmd":["bash","-lc","cd ~/workspace/app && python3 -m http.server 3000"]}' then open the same preview URL.

API

Base URL = http://127.0.0.1:9090 (set by SANDBOXED_API_BIND). Auth is off by default for local use; with SANDBOXD_API_AUTH_DISABLED=false + SANDBOXD_API_TOKENS, send -H "Authorization: Bearer <secret>".

Method & path	Body	Purpose
`POST /sandbox`	`{"ports":[3000],"env":{...}}`	create — `id` optional (ULID auto); `env` injects vars (e.g. API keys)
`GET /sandboxes`	—	list all sandboxes
`GET /sandbox/{id}`	—	get one (status, ports, container id…)
`POST /sandbox/{id}/exec`	`{"cmd":["bash","-lc","…"]}`	run a command (non-interactive)
`POST /sandbox/{id}/keepalive`	—	postpone the idle reaper
`POST /v1/sandboxes/{id}/stop`	—	stop now to free RAM (wakes on next preview hit)
`DELETE /sandbox/{id}`	—	destroy the container, keep the workspace
`POST /sandbox/{id}/purge`	—	destroy and delete the workspace
`POST /v1/sandboxes/{id}/tasks`	`{"prompt":"…","agent":"opencode"}`	run a coding agent headlessly
`GET /v1/sandboxes/{id}/tasks/{taskId}`	—	task result
`GET /v1/sandboxes/{id}/tasks/{taskId}/events`	—	live task event stream (SSE)
`GET/PUT /v1/sandboxes/{id}/files`	`{"path","content","append"}`	list / read / write workspace files
`GET /healthz`, `GET /readyz`	—	liveness / readiness

A complete, copy-pasteable runbook (including driving it from your own agent) is in AGENTS.md.

How it works

Concern	Choice
Container runtime	Docker + hardened `runc` (cap-drop ALL, `no-new-privileges`, read-only rootfs)
Workspace storage	one bind-mounted directory per sandbox under the data dir (persists)
Edge / preview	Traefik v3 Docker provider — sandboxes self-register their routes
Idle management	stop-on-idle (`docker stop`) + wake-on-request; no warm pool
State	SQLite (WAL); a reconciler converges Docker to the DB on boot
Control plane	one Go binary, shells out to the `docker` CLI over the mounted socket

The control plane runs in a container with the host Docker socket mounted and launches each sandbox as a sibling container on a shared network so Traefik can route to it. Full design: ARCHITECTURE.md.

Configuration

Everything is in .env (created from .env.example on install). The defaults run a complete local stack. The knobs you'll touch most:

Variable	Default	What it does
`PREVIEW_DOMAIN`	`localhost`	domain preview URLs hang off
`HTTP_PORT`	`80`	host port Traefik listens on
`SANDBOXED_DATA_DIR`	`/var/lib/sandboxed`	where workspaces + state live
`SANDBOXED_API_BIND`	`127.0.0.1:9090`	where the control-plane API is published
`SANDBOXD_API_AUTH_DISABLED`	`true`	open API for local use; set `false` + tokens for prod

Production / TLS

For a public deployment on a real wildcard domain:

Point *.preview.yourdomain.com at the host.
In traefik/traefik.yml, enable the websecure entrypoint and add a certificate resolver (Let's Encrypt DNS-01 is ideal — one wildcard cert covers every preview host, so you never hit per-host ACME limits).
In .env: PREVIEW_DOMAIN=yourdomain.com, PREVIEW_ENTRYPOINT=websecure, PREVIEW_TLS=true, and enable auth — SANDBOXD_API_AUTH_DISABLED=false with SANDBOXD_API_TOKENS=name:secret.
docker compose up -d.

Uninstall

./uninstall.sh            # stop the stack + remove all sandboxes + network (keeps your data)
./uninstall.sh --images   # also remove the built Docker images
./uninstall.sh --data     # also DELETE all workspaces + state (asks to confirm)
./uninstall.sh --all      # full removal: images + data

Safe by default — it removes only what sandboxed created (containers labelled sandboxed.managed=true, the compose stack, the network) and keeps your workspaces unless you pass --data/--all.

Is this a good foundation for a startup?

Yes — that's exactly the point. If you want to ship an AI app-builder or agent SaaS without first spending months building multi-tenant isolation, preview routing, idle/wake cost control, and agent orchestration, sandboxed gives you that core on day one, on a single inexpensive server, with margins you control. It's a strong, honest starting point — beta-quality, MIT-licensed, and built to be read and extended. Launch lean on it; harden as you grow (next section).

Before you scale hard: what's simple on purpose, and what to harden

sandboxed v1 is tuned for "works anywhere with just Docker, in one command." To keep it that simple, a few things were left basic on purpose. None of them affect the core loop (create → build → preview → sleep → wake → persist) — they're the knobs to tighten once you have real users and real money on the line. Plain version:

Kept simple on purpose	Fine for	Do this when you're scaling / serious
Container isolation (hardened Docker), not full VMs	your own users running their own code	running untrusted strangers' code → put each tenant on its own VM, or use gVisor / Kata / Firecracker
API auth is OFF by default	local development	turn it on (`SANDBOXD_API_AUTH_DISABLED=false` + tokens) and never expose the API port unauthenticated
Preview links are public (anyone with the URL)	demos, sharing	gate sensitive previews (the private-sandbox forward-auth hook)
Open, unlogged network egress	most apps	add firewall / egress rules + logging
Plain-directory workspaces, no disk quota	a single server	add filesystem/volume quotas; plan multi-host sharding
One server, one Docker socket (the control plane is root-equivalent on the host)	starting out	treat the host as a trust boundary, keep it patched, isolate it, and don't co-locate unrelated secrets

The short version for a fast-scaling company: the three that matter most are (1) stronger isolation (VM-per-tenant) if you ever run untrusted code, (2) turn on API auth and lock down the host, and (3) plan for more than one machine. Everything else above is a config change, not a rewrite. Start lean, revisit these as you grow — and PRs are very welcome (CONTRIBUTING.md).

License

MIT. Use it, ship it, sell what you build on it.

Tech aillmlocal-inference

Google's new Gemma 4 12B model is designed to run on any laptop with 16GB of RAM

Google released Gemma 4 12B, a new 12-billion-parameter AI model designed to run locally on consumer laptops with 16GB RAM, without significant quality loss compared to its larger 26B MoE counterpart.

Ars Technica

Summary

What: Google's Gemma 4 12B model, part of the Apache 2.0 licensed Gemma 4 family, uses Multi-Token Prediction (MTP) and a streamlined multimodal embedding module for vision and audio to achieve efficiency. It is capable of complex reasoning and agentic workflows, available on Kaggle and Hugging Face, or via tools like LM Studio.

Why it matters: This release signifies a strategic push by Google to democratize powerful AI capabilities, enabling developers and users to run sophisticated models on standard hardware, reducing reliance on cloud resources and addressing memory cost concerns.

Takeaway: Developers with laptops running 16GB of RAM can download and experiment with Google's Gemma 4 12B model from Kaggle or Hugging Face for local AI inferencing.

Deep Dive

Google's new Gemma 4 12B model, released in June 2026, is a 12-billion-parameter AI model designed for local execution.* It requires 16GB of system RAM or VRAM, making it suitable for many consumer laptops.* Google claims it is nearly as capable as the larger Gemma 4 26B Mixture of Experts (MoE) model, despite being roughly half the memory footprint.* The efficiency comes from a new Multi-Token Prediction (MTP) drafter, which calculates future tokens during unused processing cycles, improving speed.* It also features a streamlined embedding module for multimodal inputs (text, audio, images), eliminating bulky intermediate encoders for vision and directly projecting raw audio signals.* Gemma 4 12B is capable of complex multistep reasoning and agentic workflows, previously requiring larger Gemma variants.* It's available for download on Kaggle and Hugging Face, or can be accessed through tools like LM Studio without direct download.* This model expands the Gemma 4 family, which previously included mobile-optimized E2B/E4B and larger 26B MoE/31B Dense models, filling a mid-range gap.

Decoder

Multi-Token Prediction (MTP): A technique used in AI models to predict multiple future tokens at once, taking advantage of unused processing cycles to increase speed and efficiency.* Multimodality: The ability of an AI model to process and understand information from multiple input types, such as text, images, and audio.* Embedding module: A component in an AI model that converts different types of input data (like images or audio) into a numerical format (vectors) that the core language model can process.* Agentic workflows: AI applications designed to perform multiple steps or complex tasks autonomously, often by breaking down problems and executing a series of actions.

Original Article

The generative AI boom has driven the cost of memory into the stratosphere, and Google is a key part of that trend. So it’s only fitting that Google should offer some less RAM-hungry local AI models. The company has announced the release of a new Gemma 4 model that fills a gap in the lineup that launched earlier this year. The new model is efficient enough that you may be able to run it on a pretty average consumer laptop.

In April, Google released four models in the Gemma 4 family, which also marked the shift to a more open Apache 2.0 license. The initial models included two mobile-optimized options (E2B and E4B) along with a pair of models for more serious work (26B Mixture of Experts and 31B Dense). That left a rather large unserved space in the middle, which is right where the new model falls.

Gemma 4 12B is considerably more capable than the mobile versions, but it won’t require a $20,000 AI accelerator to run locally. Google says Gemma 4 12B is unique in that it can run on many consumer laptops without sacrificing quality. As long as you’ve got a computer with 16GB of system RAM or VRAM, the 12-billion-parameter model will work. That’s about half the total memory footprint of Gemma 4 26B MoE, and Google claims the new model is almost as capable, at least as far as benchmarks go.

Google says the new model is capable of complex multistep reasoning and agentic workflows that previously required the larger Gemma variants. Despite the smaller parameter count, Gemma 4 12B comes with the newly devised Multi-Token Prediction (MTP) drafters, which take advantage of unused processing cycles to calculate possible future tokens. The result is greater speed and efficiency. Google has released optional MTP versions of the other Gemma 4 models, but this is the first one to have MTP out of the box.

Gemma 4 12B is also more efficient thanks to a new approach to multimodality. The Gemma 4 family is natively multimodal, accepting text, audio, or images as inputs. Most gen AI models—including the other Gemma 4 variants—use dedicated encoders to process non-text inputs and pass that data to the LLM. This works well enough, but it increases latency and memory usage.

With the new mid-weight model, Google has implemented a streamlined embedding module for vision, featuring single-matrix multiplication and positional embedding, which allows the data to pass to the LLM with proper spatial awareness. This eliminates the need for a bulky middleman encoder. For audio, there’s no encoding at all. The developers worked out a method of projecting the raw audio signal into the same vectors used for text tokens.

If you want to check out the new Gemma 4 model, it’s accessible without a download via tools like LM Studio, Google AI Edge Gallery, and more. But the whole idea with Gemma 4 12B is that you can run it locally and on your own terms. If you’ve got the RAM, the model weights are available for download immediately on Kaggle and Hugging Face. It’s just shy of 18GB.

Tech backendpythondevelopment

What's new in Python 3.15

Python 3.15 introduces significant performance boosts via a revamped JIT compiler and explicit lazy imports, alongside new built-in types like `frozendict` and `sentinel`, and enhanced developer diagnostics.

docs.python.org

Summary

What: Python 3.15, released June 2026, ships with major features including PEP 810 for explicit lazy imports with a new `lazy` keyword to speed up startup times, PEP 814 adding a built-in immutable `frozendict` type, and PEP 661 introducing a `sentinel` type. The JIT compiler sees an 8-9% geometric mean performance improvement on x86-64 Linux and 12-13% on AArch64 macOS, supporting more bytecode operations and improving machine code generation. Error messages are also improved with suggestions and color.

Why it matters: This release shows Python's commitment to addressing long-standing performance bottlenecks like startup time and execution speed, while also refining language features and developer experience, making it more competitive for large-scale and high-performance applications.

Takeaway: Developers should consider using the new `lazy` keyword for imports in large Python applications to potentially improve startup times. Explore `frozendict` for immutable dictionary needs and be aware of the performance gains from the upgraded JIT, especially for compute-intensive tasks.

Deep Dive

PEP 810: Explicit lazy imports: Introduces a lazy soft keyword for import statements, deferring module loading until the imported name is first used. This reduces application startup times, especially for deep dependency trees, without requiring code restructuring.* PEP 814: Add frozendict built-in type: A new immutable, hashable dictionary type that preserves insertion order but is not a subclass of dict. It's useful for scenarios requiring immutable mappings, e.g., as dictionary keys.* PEP 661: Add sentinel built-in type: A new type for creating unique sentinel values, useful in APIs where None or other common values could be ambiguous.* Upgraded JIT compiler: Delivers 8-9% geometric mean performance improvement on x86-64 Linux and 12-13% on AArch64 macOS. Major upgrades include an LLVM 21 build-time dependency, a new tracing frontend supporting more bytecode operations, basic register allocation, and additional optimizations.* PEP 799: Dedicated profiling package: A new profiling module replaces cProfile and profile (deprecated). It includes profiling.tracing (deterministic) and the new Tachyon statistical sampling profiler.* Tachyon: High frequency statistical sampling profiler: Added as profiling.sampling, offering zero-overhead performance analysis for running Python processes with sampling rates up to 1,000,000 Hz. Supports wall-clock, CPU, GIL-holding, and exception handling time modes, with output to pstats, collapsed stacks for flame graphs, interactive flame graphs, and live TUI mode.* PEP 831: Frame pointers enabled by default: CPython now builds with frame pointers by default on supported platforms, making native stack unwinding faster and more reliable for debuggers and profilers.* PEP 798: Unpacking in comprehensions: List, set, and dictionary comprehensions, and generator expressions now support * and ** unpacking, simplifying code for combining iterables or dictionaries.* PEP 686: UTF-8 as default encoding: Python now uses UTF-8 as the default encoding for I/O operations without an explicit encoding argument, independent of the system environment.* Improved error messages: Interpreter provides more helpful suggestions in AttributeError exceptions, including nested attribute suggestions (e.g., .inner.area), Python equivalents for common methods from other languages (e.g., .append for .push), and suggestions for mutable counterparts when calling mutable methods on immutable types.* New bytearray.take_bytes() method: Allows taking bytes out of a bytearray without copying, optimizing common patterns for returning, emptying, or splitting buffers.* Stable ABI for free-threaded builds (abi3t): C extensions targeting the Stable ABI can now be compiled for free-threaded builds, requiring specific C API changes.* Security enhancements: tarfile module improved against path traversal attacks and other vulnerabilities.

Decoder

JIT (Just-In-Time) Compiler: A compiler that translates bytecode into machine code during program execution, rather than before, aiming to improve performance by optimizing frequently run code.* Lazy import: A programming technique where a module is not loaded into memory until it is actually needed, rather than at the program's start, to reduce startup time and memory footprint.* Frozendict: An immutable version of Python's dictionary, meaning its contents cannot be changed after creation. It can be used as a key in another dictionary or as an element in a set because it is hashable.* Sentinel value: A special value used to indicate the end of a data set or to represent a unique state, distinct from any valid data values.* Frame pointers: A register or memory location that points to the base of the current stack frame, used by debuggers and profilers to reconstruct call stacks.* Comprehension: A concise way to create lists, dictionaries, or sets in Python (e.g., list comprehension [expr for item in iterable]).* UTF-8: A variable-width character encoding for Unicode, which is capable of encoding all 1,112,064 valid character code points in Unicode.* Stable ABI (Application Binary Interface): A set of conventions for how different parts of a program (or different programs) interact at a binary level, ensuring compatibility across different versions or builds without needing recompilation.* Tachyon: A high-frequency statistical sampling profiler introduced in Python 3.15, designed for low-overhead performance analysis of running Python processes by periodically capturing stack traces.* Multithreading (Free-threaded builds): A build of Python that aims to improve concurrency by allowing multiple native threads to execute Python bytecode in parallel, potentially removing or relaxing the Global Interpreter Lock (GIL).

Original Article

Full article content is not available for inline reading.

Read the original article →

Data databasesearchaiperformance

Vector Search in Manticore Search: A Deep Dive

Manticore Search, an open-source search engine, provides a deep dive into its HNSW-based vector search implementation, emphasizing tuning for production use, data safety, and replication.

Manticore Search

Summary

What: Manticore Search details its vector search capabilities, built on the HNSW (Hierarchical Navigable Small World) algorithm within its Columnar Library. It covers creating tables with `float_vector` attributes, specifying `knn_type='hnsw'`, `knn_dims`, and `hnsw_similarity` (L2, COSINE, IP). The article also explains performance tuning with `hnsw_m` and `hnsw_ef_construction`, transaction support, physical and logical backups, and multi-master replication via Galera.

Why it matters: This article highlights the growing maturity of vector search beyond just a feature, presenting it as a full-fledged production system component requiring careful tuning, backup, and replication strategies for reliability and performance at scale.

Takeaway: If you're implementing vector search, consider tuning similarity metrics to align with embedding models, optimizing HNSW parameters (hnsw_m, hnsw_ef_construction) for recall/latency, and planning for physical backups and replication.

Deep Dive

Manticore Search uses the HNSW (Hierarchical Navigable Small World) algorithm for efficient vector search, allowing fast nearest neighbor lookups in large datasets.
It supports three similarity metrics: Euclidean Distance (L2), Cosine Similarity, and Inner Product (Dot Product), emphasizing that the choice should align with how the embedding model was trained.
Key HNSW tuning parameters are hnsw_m (number of connections, affecting accuracy and memory) and hnsw_ef_construction (index build thoroughness, affecting quality and build time).
The system provides transactional support for vector operations, ensuring atomicity and consistency.
Backup options include fast physical backups using manticore-backup or SQL BACKUP, and logical backups via mysqldump for portability.
High availability is addressed through multi-master replication using the Galera library, supporting true multi-master writes and virtually synchronous data consistency.
Performance tips involve understanding RAM vs. Disk chunk storage, using batch inserts, and leveraging Manticore's auto-optimize feature for merging disk chunks.
Real-world applications include smart search (semantic search), recommendation systems, image search, video/sound search, and cross-language search leveraging multilingual embeddings like mBERT or XLM-R.

Decoder

Vector search: A method of searching data by comparing numerical representations (vectors or embeddings) of items to find those that are semantically similar.
Embeddings: Numerical representations of data (like text, images, or audio) in a multi-dimensional space, where the distance or angle between vectors indicates their similarity.
HNSW (Hierarchical Navigable Small World): A graph-based algorithm for efficient approximate nearest neighbor (ANN) search, commonly used in vector databases for fast similarity queries. It builds a multi-layer graph where higher layers have fewer nodes and longer-range connections, allowing quick traversal to find relevant neighbors.
Similarity metrics: Mathematical functions used to quantify how similar two vectors are. Common ones include:
Euclidean Distance (L2): Measures the straight-line distance between two vectors.
Cosine Similarity: Measures the cosine of the angle between two vectors, indicating their orientation similarity.
Inner Product (Dot Product): Calculates the sum of the products of corresponding vector entries, effective when both magnitude and direction matter.
Galera library: A multi-master synchronous replication solution for databases, providing high availability and data consistency across multiple nodes.

Original Article

Full article content is not available for inline reading.

Read the original article →

Data databaseperformancecloud

Debunking 8 data layout myths: why Liquid Clustering outperforms partitioning

Databricks argues its Liquid Clustering vastly outperforms traditional Hive-style partitioning for modern lakehouses, offering dynamic data organization, row-level concurrency, and significant query speedups.

Databricks

Summary

What: Databricks debunks 8 myths about data layout, advocating for Liquid Clustering over traditional partitioning for lakehouses. Liquid Clustering dynamically organizes data based on evolving clustering keys, eliminating small-file problems and supporting row-level concurrency. Success stories include Arctic Wolf achieving a 7.7x query speedup on a 3.8 PB table and Bolt seeing a 138% write throughput increase.

Why it matters: This article signals a major shift in data lakehouse architecture, moving away from static, rigid partitioning to more flexible, adaptive clustering methods that better handle the dynamic and diverse query patterns of modern AI agents and real-time pipelines.

Takeaway: If you're using Databricks, consider migrating partitioned tables to Liquid Clustering to improve query latency, write throughput, and storage efficiency, especially for petabyte-scale datasets.

Deep Dive

Traditional Hive-style partitioning is rigid, forcing users to commit to a physical data organization at table creation, often leading to over-partitioning and small-file problems in over 75% of cases.
Liquid Clustering dynamically organizes data using clustering keys that can change over time, and it intelligently selects optimal file organization, including through Automatic Liquid Clustering.
It offers benefits like better skew handling, row-level concurrency, elimination of small-file problems, multi-dimensional clustering, and lower write amplification.
Databricks debunks myths, stating that directory-pruning doesn't exist on modern open table formats like Delta and Iceberg, and Liquid Clustering performs pruning at file granularity using statistics.
Liquid Clustering supports metadata-only operations like DELETEs, COUNT, DISTINCT, and GROUP BY queries, offering significant speedups (e.g., ~90% faster DELETEs).
It performs well at petabyte scale, with OPTIMIZE planning times reduced from 12 hours to 23 minutes on 10 PB tables, and execution 5x faster.
Liquid Clustering is a write-side optimization generating standard Parquet files with min/max stats, making its benefits accessible to any compatible reader (e.g., Apache Spark, DuckDB).
It provides row-level concurrency, addressing the need for write boundaries that partitioning historically offered.
Liquid Clustering is superior to Z-Ordering, which has poor clustering quality and requires unnecessary rewrites.
Selective overwrites are supported on Liquid tables via REPLACE USING and REPLACE ON SQL syntaxes.
Arctic Wolf achieved a 7.7x query speedup and Bolt saw a 138% write throughput increase after migrating to Liquid Clustering.

Decoder

Lakehouse: A data architecture that combines the cost-effectiveness and scalability of a data lake with the data management and query capabilities of a data warehouse.
Hive-style partitioning: A traditional method of organizing data in data lakes by creating directory structures based on column values (e.g., year=2024/month=01/). This can lead to issues like too many small files or inefficient queries if partitions are chosen poorly.
Liquid Clustering: A Databricks feature for Delta Lake and Iceberg tables that dynamically organizes data files based on clustering keys, aiming to optimize data layout for query performance without rigid partitioning.
Clustering keys: Columns chosen to organize data in Liquid Clustering, which guides the data engine to physically store related data together, improving query performance by reducing data scanned.
Row-level concurrency: The ability for multiple writers to simultaneously modify different rows within the same table or file without conflicts, even if those rows are in the same physical data block.

Original Article

Liquid Clustering is the data layout for open table formats that outperforms partitioning while sidestepping its limitations
8 common myths keep teams tied to partitioning, and none of them hold up anymore
Customers using Liquid Clustering report dramatic improvements in query latency, write throughput, storage efficiency, and data freshness, with the largest gains compounding at petabyte scale

Introduction

Laying out data is one of the oldest problems in computing.

For over 15 years, since the advent of Hadoop and Hive, partitioning has been the standard way to physically organize data for processing and analysis. However, today’s Lakehouses serve agents, real-time pipelines, and query patterns that shift faster than any human can re-partition for.

Liquid Clustering is the modern standard and customers are running it at every scale, including dozens with petabyte scale tables in production. In this blog, we’ll cover why Liquid Clustering wins in the Lakehouse. Along the way, we’ll debunk 8 common data layout myths, walk through 3 success stories of teams converting partitioned tables to Liquid Clustering, preview what’s coming next, and show how to get started.

Why Liquid Clustering wins in the modern lakehouse

Hive-style partitioning forces users to commit, at table-creation time, to a physical organization of data that manifests in the file structure. Pick a column with too high cardinality and you get billions of tiny files. Pick the wrong column and queries may get slower, not faster. Either way, you’re stuck rewriting the table. It’s common to get wrong: in our analysis, Hive-style partitioning leads to over-partitioning and small-file problems in more than 75% of cases.

Liquid treats clustering keys as input that the engine uses to guide optimal file organization. Keys can be changed at any time, or intelligently selected through Automatic Liquid Clustering. Cardinality isn’t a constraint, and the layout can evolve over time without unnecessary rewrites.

The benefits of Liquid Clustering all derive from the above principle: better skew handling, row-level concurrency, no small-file problems, multi-dimensional clustering, and lower write amplification.

Small files and data skew with partitioning; good file-sizing and clustering with Liquid

In 2026, the layout should be an implementation detail of the table, with every engine that reads or writes benefitting from it. This is increasingly important as agents enter the Lakehouse, generating and consuming more data than ever. Humans and agents need forgiving interfaces, free of the potential side-effects of Hive-style partitioning.

Debunking 8 common data layout myths

Liquid Clustering became Generally Available in 2024. Since then, we’ve iterated on it non-stop with customers running it at scale. In that time, some common myths about Liquid Clustering and partitioning have persisted, and today we want to debunk them.

Myth #1: Partitioning is faster because it can prune directories instead of files

The myth goes: With partitioning, Spark or other engines can prune whole directories without opening any files inside of them.

Reality: Directory-pruning does not exist on modern open table formats like Delta and Iceberg. Delta, for example, uses a transaction log to track every data file along with per-column statistics, and pruning happens against those statistics, not the directory structure. The engine never lists directories to plan a query. It reads the transaction log, evaluates filters against statistics, and skips files that don’t match. Liquid Clustering uses the same mechanism. Whether your data lives in `date=x/hour=y/` or a flat directory of clustered files, the engine prunes at file granularity. There is no directory-level shortcut to lose.

Myth #2: Partitioning is better when filtering on a low-cardinality column

The myth goes: For a column with a small number of distinct values, partitioning gives you perfect data separation and good file sizes.

Reality: Liquid Clustering automatically detects when to apply low-cardinality optimizations. For example, if you cluster by (date, user_id), and date has low cardinality, the system aims for each file to contain rows from only a single date. Higher-cardinality columns, like user_id, are then automatically used for finer-grained sorting within each date's files, without having to rely on other sorting techniques like Z-Ordering.

low-cardinality liquid clustering optimizationg

We saw the following improvements while benchmarking this Liquid optimization on a real-world data warehousing benchmark: 35% lower time for clustering and 22% faster query times.

Additionally, Liquid Clustering is designed to be better than partitioning when clustering on a high-cardinality column, as it always tries to create files of a good size.

Myth #3: Liquid Clustering doesn’t support metadata-only operations

The myth goes: Metadata-only operations are uniquely supported by partitioning. A DELETE aligned with partition boundaries only updates the table’s metadata, and aggregates on partition columns can be computed without scanning files. Liquid Clustering can’t do the same.

Reality: Liquid Clustering also supports metadata-only operations including DELETEs, COUNT, DISTINCT, and GROUP BY queries. The engine uses the same per-file min/max stats it uses for data skipping to determine when a query’s answer can be computed from metadata alone. In our benchmarks, metadata-only DELETEs on Liquid Clustered tables ran ~90% faster than full-rewrite DELETEs. Other metadata-only aggregate queries saw up to 27x speedups.

Myth #4: Liquid Clustering doesn’t work well at petabyte scale

The myth goes: OPTIMIZE on a PB-size table can run for hours, and the cost of maintenance is too high.

Reality: We’ve made a number of significant improvements to OPTIMIZE, and dozens of customers now have PB-scale Liquid Clustered tables in production. Two years ago, planning, the first phase of OPTIMIZE, could take up to 12 hours on a 10 PB Liquid table in some cases. We’ve spent the time since reducing planning time down to 23 minutes. Execution, the second phase of OPTIMIZE, got 5x faster on a Medium DBSQL cluster.

Myth #5: Liquid Clustering only benefits a subset of readers

The myth goes: Liquid Clustering is only beneficial for Databricks readers to UC managed Delta tables.

Reality: Liquid Clustering is a write-side optimization. It’s how the engine organizes files for efficient data skipping. The output is standard Parquet files with min/max stats, written into open table formats like Delta/Iceberg. Any compatible reader (e.g. open-source Apache Spark, DuckDB, etc.) can use those stats to skip files. Liquid Clustering is available on both external / managed and Delta / Iceberg tables, and the benefit is applicable regardless of the reader.

Myth #6: Partitioning is necessary for concurrent ETL

The myth goes: Concurrent ETL needs write boundaries. Without partitioning, two writers updating the same table risk colliding, and Delta/Iceberg concurrency control forces one of them to retry or fail. Partition and give each writer its own slice of the table, so two pipelines never touch the same files.

Reality: Operating at partition granularity was a workaround for an older concurrency model. Unlike partitioning which only has file-level concurrency, Liquid provides row-level concurrency. Two writers updating different rows no longer conflict, even if those rows live in the same file. This removes one of the main reasons teams partitioned tables: maintaining write boundaries to avoid serialization. With Liquid Clustering, ETL can easily operate concurrently against the same table.

Myth #7: Z-Ordering makes up for partitioning’s shortcomings

The myth goes: Partitioning handles the partition column’s filters, and Z-Ordering handles the rest. By running OPTIMIZE ZORDER BY, the engine sorts data for optimal skipping on filters that don’t align with the partition scheme.

Reality: Z-Ordering doesn’t save partitioning. In fact, it has its own structural problems.

The first is poor clustering quality. Z-Order doesn’t maintain a true ordering across the table. Values for the same column can get spread across many files, so per-file min/max ranges are wider and queries skip fewer files than they would with Liquid.
The second is unnecessary rewrites. Z-Order has to be rerun periodically as new data lands, and each rerun rewrites large amounts of old, possibly already-clustered data to restore clustering quality. With continuous ingestion, the cost of keeping data well-clustered with Z-Order grows along with the table.

Liquid clusters incrementally, including at write time, so the layout stays optimal without unnecessary rewrites.

Myth #8: Partitioning is necessary for selective data overwrites

The myth goes: Being able to selectively overwrite data is only available through Dynamic Partition Overwrites.

Reality: Selective overwrites work on Liquid tables natively. Databricks supports REPLACE USING and REPLACE ON, two SQL syntaxes for selectively overwriting data on any data layout: Liquid Clustered, partitioned, or plain unclustered tables. Unlike Dynamic Partition Overwrite which requires a Spark config, REPLACE USING and REPLACE ON can be used on any compute: classic clusters, SQL warehouses, and Serverless. The operation is atomic and matches on any column you choose.

Success stories: migrating from partitioning to Liquid Clustering

7.7x query speedup on Arctic Wolf’s 3.8 PB security telemetry table

Arctic Wolf runs a 3.8+ PB security telemetry table ingesting 1+ trillion events per day, where threat hunters depend on fresh data to detect active attacks.

After migrating from partitioning to Liquid Clustering on Unity Catalog managed tables with Predictive Optimization, Arctic Wolf saw:

90-day queries drop from 51 seconds to 6.6 seconds
File count dropped from 4M to 2M
Data freshness improved from hours to minutes

Read and write improvements on critical CDC tables for Bolt

Bolt recently tried Liquid Conversion (currently in Private Preview), which converts partitioned tables to Liquid in-place using ALTER TABLE .. REPLACE PARTITIONED BY WITH CLUSTER BY. They observed the following read and write benefits on a TB-scale CDC table after converting to Liquid Clustering:

Write throughput (rows/sec) increased by 138%
Read times were reduced by up to 63%, with an average of 21% reduction across 9 representative queries

Liquid Clustering dramatically reduced the work that each write was doing, increasing our throughput significantly on our most critical CDC table. Reads also improved across the board. The best thing was: we ran the conversion from partitioning alongside live ingestion with zero downtime. With this, Liquid Clustering provided us exactly the kind of performance and reliability we needed at platform scale. — Marcin, a senior platform engineer at Bolt

5.9x speedup in query time on a petabyte-scale internal workload

We run a 1.1 PB table internally that's queried thousands of times a day, mostly by engineers running production investigations and observability dashboards. Originally it was partitioned by date and hour, assuming time-range scans would dominate. However, that assumption turned out to be incomplete. While time-range scans were common, the table was also frequently queried by source and id, forcing the engine to scan every file in the relevant date and hour partitions to find a handful of rows.

Adding source and id as partitions wasn’t viable, because there were too many distinct values. This would have created billions of tiny files. Liquid Clustering removed the trade-off, allowing clustering on time and the additional identifier columns simultaneously, while maintaining good file sizes.

	Layout
Before	Partitioned by date, hour
After	Clustered by date, hour, source, id

Benchmarks showed massive improvements across 16 representative production queries:

Metric	Before (partitioned)	After (Liquid)	Improvements
Wall Clock Time	406s	70s	5.9x speedup
Bytes Read	3.5 TB	0.48 TB	86% fewer bytes read

The table itself got smaller too. Total size dropped from 1.1 PB to 0.8 PB, a 27% reduction with no change in the underlying data. Better-clustered files compress more efficiently, and the small-file tax that comes with over-partitioning disappears.

What’s coming next for Liquid Clustering

Optimizing Liquid-to-Liquid joins: up to 51% faster with 87% less shuffle

Today, joining Liquid tables on their clustering columns can require a full data shuffle, even when the data is already organized by those columns. Co-clustered joins (now in Private Preview) remove that shuffle automatically. On a real-world data warehousing benchmark, a Liquid-to-Liquid join ran ~51% faster (28 minutes → 14 minutes) and shuffled 87% less data (1.2 TiB → 150 GiB) than the same query without the optimization.

Easy Liquid Conversion of partitioned tables

Before, converting a partitioned table to Liquid Clustering required a full table rewrite and downstream breaking changes with REPLACE TABLE or a cutover with dual writes and planned downtime. We’re introducing a new command (now in Private Preview) that makes this conversion easier, minimizing both downtime and rewrites.

Getting started with Liquid Clustering

Create a table with Liquid Clustering:

Or, if you’re using UC managed tables with Predictive Optimization, use Automatic Liquid Clustering to intelligently select clustering keys based on your workload and query patterns:

Liquid Clustering is the layout for the modern Lakehouse. Try it on your next table, or reach out to your account team today to try the Private Previews for partitioned-to-Liquid Conversion and Co-Clustered joins!

Don’t forget to catch us at DAIS!

Optimize Lakehouse Cost and Performance with Intelligent Storage and Liquid Clustering

Get the latest posts in your inbox

Subscribe to our blog and get the latest posts delivered to your inbox.

Data databaseperformancebackendredis

Diving deep into Redis's new array data type

Redis 8.8 introduces a new native Array data type, designed by Salvatore Sanfilippo, providing constant-time positional access for sparse or dense arrays where index itself carries semantic meaning.

Redis

Summary

What: Redis 8.8 introduces the Array data type, designed by Salvatore Sanfilippo, addressing the need for constant-time positional access. Unlike lists (O(N) access) or hashes (no range queries), Arrays efficiently handle dense and extremely sparse data by dividing the index space into 4096-slot groups, only allocating memory for groups with data. Commands include `ARGETRANGE`, `ARSCAN`, `ARCOUNT`, `ARLEN`, `ARRING` (for fixed-size ring buffers), `ARGREP` (server-side pattern matching with regex), `AROP` (server-side aggregations like SUM, MIN, MAX), and `ARDEL`/`ARDELRANGE`.

Why it matters: This new data type addresses a critical gap in Redis's offerings for use cases where the numerical index itself is semantically important, pushing Redis further into domains previously requiring complex workarounds or external databases, while maintaining its core strengths of speed and memory efficiency.

Takeaway: If you're using Redis and managing data where numeric indices have intrinsic meaning (e.g., log line numbers, port IDs, event sequences), explore the new Array type in Redis 8.8 for more efficient and idiomatic solutions compared to lists or hashes.

Deep Dive

The new Redis Array type, introduced in Redis 8.8, provides effectively constant-time (O(1)) positional access by index, filling a gap where other data types like lists (O(N) for index access) or hashes (no range queries) fall short.
It is designed for scenarios where the index itself carries semantic meaning, such as document line indexing, stack trace analysis, or workflow steps.
The internal implementation divides the index space into groups of 4096 slots, only allocating memory for groups that contain data, making it highly efficient for sparse arrays. Empty groups cost only 8 bytes.
Small integers, floats, and short strings are stored directly inside pointer slots, optimizing memory usage for dense arrays of small values.
For very large arrays (indices >= 8,388,608), the structure silently promotes to a three-level hierarchy (superdir -> block -> group) to maintain O(1) access without unbounded directory growth.
Key commands include:
ARGETRANGE: Returns a range of values, including empty slots.
ARSCAN: Iterates only over occupied slots in a range, skipping gaps.
ARCOUNT: Returns the total count of active entries in O(1).
ARLEN: Returns the highest occupied index + 1, indicating data extent.
ARRING: Manages a fixed-size ring buffer for event logs atomically.
ARGREP: Performs server-side pattern matching (EXACT, MATCH, GLOB, RE) on occupied slots, skipping empty ones.
AROP: Executes server-side aggregations (SUM, MIN, MAX, AND, OR, XOR, USED, MATCH) over a specified range.
ARDEL, ARDELRANGE: Deletes single entries or ranges, freeing memory for emptied slices immediately.
The article contrasts Arrays with Lists, Hashes, Sets, and Sorted Sets, highlighting that Arrays are superior when the numeric index is part of the data model's meaning, not just an implementation detail.

Decoder

Redis Array: A new native data type in Redis 8.8 designed for efficient storage and retrieval of data where the numerical index is semantically important, offering constant-time access and optimized handling of sparse data.
Constant-time positional access (O(1)): The ability to retrieve an element at a specific index in a data structure with a fixed amount of time, regardless of the size of the data structure.
Sparse array: An array where most of the elements have a default or null value, and only a small fraction of elements hold significant data. The Redis Array optimizes storage by not allocating memory for empty slots.
Ring buffer: A fixed-size circular data buffer where new data overwrites the oldest data when the buffer is full, often used for event logs or streaming data.

Original Article

Full article content is not available for inline reading.

Read the original article →

Data opensourcerustpython

dbt Core v2 is here: still open source, now rebuilt for what's next

dbt Core v2.0 open-sources the Rust-based Fusion engine runtime under Apache 2.0, promising faster parsing, Parquet artifacts, and a unified foundation with Fusion.

dbt Labs

Summary

What: dbt Core v2.0 alpha release is now built on the same Rust-based runtime as dbt Fusion, with much of Fusion's code open-sourced under Apache 2.0. This brings significant parse time improvements, new Parquet artifacts, revamped local documentation, ADBC-powered adapter building, and simpler installations, while Fusion remains the recommended binary for most users.

Why it matters: This move signifies dbt Labs' commitment to a more performant and unified dbt framework, leveraging Rust and modern data technologies, and clarifying the roles of dbt Core (fully open-source) and dbt Fusion (enhanced binary) in the ecosystem.

Takeaway: If you use dbt, consider migrating to dbt Core v1.12 first and running `dbt parse --use-v2-parser` to prepare for the v2.0 upgrade, or install dbt Fusion for the best free CLI experience.

Deep Dive

Unified Foundation: dbt Core v2.0 merges with the dbt Fusion engine's Rust-based runtime, placing much of the previously ELv2-licensed Fusion code under the Apache 2.0 license.
Key Features: Includes significant parse time improvements for large projects, a tighter language specification, new high-performance Parquet artifacts for data, a revamped local documentation experience, streamlined adapter building via ADBC and Arrow, and simpler installations by reducing Python virtual environment complexities.
Distributions: dbt Fusion will continue as the recommended precompiled binary for most users, offering additional free and premium features, while dbt Core v2.0 serves teams requiring fully open-source code or custom builds.
Strategic Shift: This change aims to consolidate development efforts on a single, more performant Rust-based engine, ensuring that all users benefit from ongoing innovations.
Backward Compatibility: The existing Python-based dbt Core v1.x will still be available, but future capabilities will increasingly require upgrading to v2.x.
Migration Path: Users are advised to upgrade to dbt v1.12 first, which includes the Fusion-powered parser for compatibility checks, before moving to v2.0.

Decoder

dbt (data build tool): An open-source framework that helps data analysts and engineers transform data in their warehouse by writing SQL SELECT statements.
dbt Core: The open-source command-line interface for dbt.
dbt Fusion: An enhanced version of the dbt CLI, previously under ELv2 license, that includes performance improvements and additional features, now sharing its Rust runtime with dbt Core v2.0.
Apache 2.0 license: A permissive free software license by the Apache Software Foundation.
ELv2 license: The Elastic License v2, a non-open source license often used for software where commercial offerings exist.
Parquet artifacts: Data stored in the Apache Parquet column-oriented data file format, known for efficient data compression and encoding schemes.
ADBC (Arrow Database Connectivity): A specification for a set of high-performance APIs for database connectivity using Apache Arrow for data transfer.

Original Article

Full article content is not available for inline reading.

Read the original article →

Data securityaiagentspolicy

Authorization for AI agents: What to build before the EU AI Act deadline

Upcoming EU AI Act regulations require externalized authorization and robust audit trails for AI agents, demanding new architectural patterns for identity and runtime policy enforcement.

Cerbos

Summary

What: This article, from the Cerbos blog, outlines three critical gaps for AI agent security: unique per-instance agent identities with sponsor-tied lifecycles, audit trails that survive sub-agent delegation, and an external runtime policy engine to gate agent-to-tool calls. It emphasizes that the EU AI Act's high-risk obligations (Articles 9, 10, 12, 13), though possibly deferred to December 2027, mandate these controls, specifically pointing out that policy must be decoupled from the agent itself.

Why it matters: The impending EU AI Act is forcing a crucial architectural shift in how AI agents are secured and governed, moving authorization and auditing out of the probabilistic agent's control into external, auditable systems.

Takeaway: If your organization is developing or deploying AI agents, inventory existing agents, assign named human sponsors, and begin implementing an external runtime policy engine to gate agent-to-tool calls and wire up a comprehensive audit chain, especially considering the EU AI Act.

Deep Dive

The "Gap": Jonathan Care of KuppingerCole highlights that while frameworks govern what models say, almost nothing governs what agents do. This is the core problem.
Three Layers of Gaps:
Identity: Each agent instance needs a unique, short-lived identity with credentials tied to a human sponsor's lifecycle, rather than a single long-lived API key per "the agent."
Audit: Existing audit trails break with agent-to-sub-agent delegation; a "chain of custody" is missing, detailing human authorization, purpose, and data flow.
Orchestration (Runtime Policy): The most critical and least mature area. An external policy engine, not the agent itself, must decide if an agent can invoke a tool with specific arguments. This provides a decoupled, fail-closed security model.
EU AI Act Urgency: Articles 9 (risk management), 10 (data governance), 12 (record-keeping), and 13 (transparency) of the EU AI Act impose obligations on AI system providers. While the deadline for high-risk systems might slip from August 2026 to December 2027, the architectural requirement for external runtime controls remains.
Why External Policy Matters: Controls inside the agent (e.g., prompt guardrails) are part of the probabilistic system being constrained. Externalizing policy allows security teams to reason about it, provides a single engine across all agents, unified audit logs, and consistent change management.
Convergence: Vendors, analysts (Care, Kuppinger, CoSAI), and standards bodies (OpenID Foundation's AuthZEN) are independently converging on this same architecture: agent-scoped identity, delegation-aware audit, and external runtime policy.
Immediate Steps for Security Leads:
Inventory: Identify all existing "shadow AI" agents within the organization.
Sponsor Agents: Assign a named human owner to each agent whose departure triggers the agent's deactivation.
Externalize Policy: Implement a runtime policy engine (like Cerbos) to evaluate agent-to-tool calls outside the agent's reasoning loop.
Wire Audit Chain: Ensure every agent action records the human sponsor, original purpose, and the policy decision.
Test Fail-Closed: Verify that if the policy engine is unreachable, the agent fails safely (i.e., does not proceed with the action).

Decoder

EU AI Act: The European Union's comprehensive regulation on artificial intelligence, aiming to ensure AI systems are safe, transparent, non-discriminatory, and environmentally friendly. High-risk AI systems (e.g., in critical infrastructure, law enforcement) face stringent requirements.
Externalized Authorization: A security pattern where the decision logic (Policy Decision Point, PDP) for authorization is separated from the application enforcing it (Policy Enforcement Point, PEP), allowing centralized management and consistent application of policies.
Least-Privilege: A security principle where a user, program, or process is given only the minimum access levels or permissions necessary to perform its function.
Fail-Closed Runtime: A system design where if a security or authorization component fails or is unreachable, access is denied by default, preventing unauthorized actions. This is opposed to "fail-open," where failure grants access.
PEP/PDP: Policy Enforcement Point / Policy Decision Point. PEP enforces decisions, PDP makes decisions.
RAG pipeline: Retrieval Augmented Generation, an AI architecture that retrieves relevant information from a knowledge base before generating a response, enhancing accuracy and reducing hallucinations.
MCP server: Multi-agent Coordination Protocol server, a hypothetical server facilitating secure communication and coordination among multiple AI agents.

Original Article

Full article content is not available for inline reading.

Read the original article →

Design aiuxenterprise

What Will AI-first UX Look Like?

AI-first UX is evolving beyond chatbots to integrated, agentic systems that replace traditional forms and dashboards with conversational interfaces and collaborative AI agents.

InfoWorld

Summary

What: The article, featuring insights from leaders like Vishal Sood (Typeface) and Hector Ouilhet Olmos (AWS Solutions), describes how AI-first UX will collapse enterprise app sprawl. It will shift from disconnected tools to orchestrated environments where AI agents carry context, enabling conversations to replace forms, AI to generate reports, and workflows to become agentic collaborations. Examples include Workday's Sana and Anthropic's Claude Cowork.

Why it matters: This signals a fundamental paradigm shift in enterprise software design, where AI moves from being an additive feature to the core interaction model, demanding new approaches to development, testing, and orchestration for AI agents.

Takeaway: Developers should begin exploring multiagent frameworks and orchestration platforms to prepare for building collaborative human-AI agent experiences rather than traditional form-based applications.

Deep Dive

AI-first UX aims to collapse enterprise app sprawl by integrating AI agents that carry context across workflows.
The shift is from disconnected tools to orchestrated systems, blending conversational interfaces, visual workspaces, and agentic orchestration.
Traditional user interfaces like forms will be replaced by adaptive conversations and "interviews" where AI pre-fills information and gathers data dynamically.
Reporting and dashboards will evolve from static visualizations to narrative copilots generated by AI, explaining insights and suggesting actions.
Workflows will transform into collaborations between humans and AI agents, with AI orchestrating multi-step processes across systems.
This requires a fundamental shift from the "desktop metaphor" to designing interfaces that mimic human collaboration patterns like negotiation and interruption.
New platforms like Workday's Sana (with 300+ skills) and Anthropic's Claude Cowork (with plug-ins for legal/marketing) are examples of this shift.
Development teams will need to update observability standards, automate AI agent testing, and review multiagent frameworks and orchestration platforms.

Decoder

Agentic AI: AI systems capable of perceiving their environment, making decisions, and performing actions over an extended period to achieve specific goals, often involving multiple steps and interactions.
SaaS sprawl: The uncontrolled proliferation of Software-as-a-Service applications within an organization, leading to inefficiencies, increased costs, and data management challenges.

Original Article

Full article content is not available for inline reading.

Read the original article →

Design aicareerpolicy

Who Survives AI? Useful Insights from Walter Terruso

Italian interior designer Walter Terruso controversially predicts AI will decimate mid-level creative jobs, leaving an "outliers economy" where only truly exceptional designers survive.

DesignWanted

Summary

What: Walter Terruso, an Italian interior designer and AI consultant, warns that AI will create an "outliers economy," displacing mid-level professionals. He advises designers to actively use AI tools like Claude and Midjourney (spending around €500/month), develop unique perspectives, and critically evaluate AI outputs, emphasizing that only genuinely exceptional talent will thrive.

Why it matters: Terruso's blunt assessment challenges the common optimistic narrative about AI's impact on creative professions, highlighting a significant and often unspoken concern about job displacement for competent but not exceptional talent, pushing for a reevaluation of skill development and education.

Takeaway: Seriously engage with AI tools like Claude and Midjourney now, not to replace thinking, but to understand their capabilities and limitations, and focus on developing an irreplaceable, unique creative perspective and critical judgment.

Deep Dive

Walter Terruso, an Italian interior designer with 20 years of experience, is now an AI consultant helping studios integrate agentic AI workflows.* He argues that AI will create an "outliers economy," where only designers with exceptional creativity and unique perspectives will thrive.* Mid-level professionals, who perform competent but not extraordinary work, will face significant challenges as their technical and junior tasks are automated first.* Terruso estimates his personal AI tool spending at around €500 a month for intensive client work using Claude, Midjourney, and Higgsfield.* He advises students to start with more affordable plans (e.g., Claude for €20/month, Midjourney for €10/month), emphasizing time investment in learning.* AI currently struggles with on-site visits, tactile material judgment, client management for difficult briefs, and the nuanced judgment from experience.* He notes that capabilities like spatial planning, previously an AI weakness, are rapidly improving.* Terruso stresses three points for design students: use AI tools seriously now, do not let AI replace critical thinking, and figure out what unique perspective they have.* He suggests that studios' cost savings from AI will ultimately migrate to the big tech companies providing the AI tools, not necessarily benefiting studios or clients.* Terruso criticizes current political leadership for lacking the cultural understanding to address AI's societal impact and educational institutions for not adequately preparing students.* He proposes that university education should shift from content delivery (which AI can do) to fostering critical judgment and unique perspectives through collaborative problem-solving with professors.

Decoder

Agentic AI: AI systems designed to perform a sequence of tasks autonomously, making decisions and taking actions to achieve a given goal, rather than just responding to individual prompts.* Claude: An AI assistant developed by Anthropic, known for its conversational abilities and context window.* Midjourney: An AI program and service that generates images from natural language descriptions ("prompts").

Original Article

Full article content is not available for inline reading.

Read the original article →

Design aiuxenterpriseopensource

Design Systems that Document AI

Only 26 out of 156 public design systems meaningfully document AI, though leaders like IBM, AWS, and Microsoft independently converged on four core principles for AI-powered experiences.

The Design System Guide

Summary

What: A survey of 156 public design systems revealed only 26 address AI. Leading teams, including IBM, AWS, GitLab, and Microsoft, independently converged on four core principles: always mark AI-generated content, explain it in layers, keep humans in control, and design for failure. IBM Carbon for AI and AWS Cloudscape are noted for their robust guidelines.

Why it matters: This analysis reveals that while the industry is rapidly adopting AI, design system practices are lagging, yet leading companies are independently establishing crucial guidelines for ethical and usable AI. This convergence signals emerging best practices for integrating AI responsibly into user experiences.

Takeaway: If you manage or contribute to a design system, begin incorporating guidelines for marking AI-generated content, explaining AI decisions, ensuring human control and override capabilities, and explicitly planning for AI failures and recovery paths.

Deep Dive

A survey of 156 public design systems found only 26 address AI, an increase from a year ago.* Leading companies such as IBM, AWS, GitLab, Red Hat, SAP, Workday, and Apple, despite no coordination, have converged on four core principles for documenting AI in design systems.* These four rules are: 1) Always mark AI-generated content, 2) Explain AI in layers (What, Why, How), 3) Keep humans in control, and 4) Design explicitly for failure.* The article introduces an "AI-readiness maturity model" for design systems, ranging from Level 0 (no AI guidance) to Level 5 (AI as system infrastructure, exemplified by Microsoft Fluent's Copilot UI Kits).* IBM Carbon for AI is praised for its tokenized AI identity, where an interactive AI label doubles as a trigger for layered explanations.* AWS Cloudscape offers detailed patterns for user-authorized actions, defining authorization levels, human-in-the-loop sequences, and rules against irreversible changes without explicit consent.* GitLab Pajamas emphasizes designing AI to be collaborative, not autonomous, with risk assessments and fallback rules for when AI confidence is low.* Red Hat PatternFly highlights the importance of a "value gate," questioning whether AI is truly needed before implementation, a principle also echoed by Apple and GitLab.* Critical gaps in most public design systems include guidance on AI confidence levels, source attribution, hallucination recovery, user correction loops, and multi-agent workflows.* The author stresses that the true value of a design system for AI lies not just in component creation, but in documenting when and how to use AI responsibly and what happens when it fails, treating AI documentation as living infrastructure.

Decoder

Design system: A set of interconnected patterns and shared practices that provide consistency and efficiency in product development and design.* Explainability framework: A set of guidelines or tools designed to help users understand why an AI system made a particular decision or prediction.* Hallucination (AI): When an AI model generates information that is plausible-sounding but incorrect or fabricated, not based on its training data.* Copilot UI Kits: A specific set of UI components and patterns, developed by Microsoft Fluent, designed for integrating AI assistant experiences (like Copilot) into applications across various platforms.

Original Article

Full article content is not available for inline reading.

Read the original article →

AI startupchinafunding

deepseek slated to draw 7 billion in maiden fundraising sources say

Chinese AI champion DeepSeek is reportedly raising $7.4 billion in its first funding round, valuing the company up to $59 billion, intensifying US-China tech rivalry.

CNBC

Summary

What: DeepSeek, founded by Liang Wenfeng, is set to raise about 50 billion yuan ($7.4 billion) from investors including Tencent Holdings and CATL. This round, expected to close in weeks, could value the company between $52 billion and $59 billion. As of Feb. 26, 2026, it reportedly granted early access to its new AI model to Chinese companies, not American engineers.

Why it matters: This massive funding round for a Chinese AI firm, combined with reports of exclusive domestic access to its models, underscores China's aggressive push for self-sufficiency and leadership in AI, further escalating the technological competition with the U.S.

Original Article

DeepSeek reportedly has not shared its upcoming AI model with American engineers and instead granted early access to Chinese companies, further intensifying the technological war between the U.S. and China, as of Feb. 26, 2026.

Chinese AI startup DeepSeek is set to raise about 50 billion yuan ($7.4 billion) in its first funding round from investors including Tencent Holdings and CATL people with knowledge of the matter said.

The fundraising could value the company after the investment at between 350 billion yuan and 400 billion yuan, or between $52 billion and $59 billion, the people said, declining to be identified because the information is confidential.

DeepSeek became China's national AI champion and garnered global fame early last year, when its V3 and R1 models drew widespread praise in Silicon Valley and challenged U.S. assumptions about China's AI capabilities.

Tencent, CATL set to be biggest external investors

The startup's founder, Liang Wenfeng, has committed 20 billion yuan of his own money, the people said, adding that tech conglomerate Tencent is considering 10 billion yuan and battery giant CATL is looking at 5 billion yuan, which would make them the largest external investors in the round.

DeepSeek is also in final talks with China's national artificial intelligence fund, gaming developer NetEase and e-commerce giant JD.com, they said, noting that the planned number of investors was fewer than 10.

DeepSeek, Liang, NetEase, JD.com and the China Integrated Circuit Industry Investment Fund, which is the main backer of the national AI fund, did not immediately respond to Reuters' requests for comment. Tencent and CATL declined to comment.

The line-up underscores China's efforts to build an increasingly self-sufficient AI industry, from models to the energy infrastructure needed to power them.

CATL, best known as a dominant supplier in the electric vehicle battery supply chain, has recently pushed into AI data centres, exploring opportunities to provide power equipment and energy storage solutions as AI workloads drive demand for large-scale, reliable power.

Tencent has sought to promote its own AI model, Hunyuan, but trails domestic market leaders including ByteDance's Doubao and DeepSeek. A closer relationship with DeepSeek could help Tencent keep pace with rival Alibaba, which has prioritised its in-house Qwen AI model.

Other investors in the mix

DeepSeek is expected to complete the round within the next couple of weeks, said the people, cautioning that financial details could still change.

The startup has to date not made any statements about whether it plans an initial public offering sometime in the future.

Hong Kong-headquartered IDG Capital and Monolith Capital are also among the prospective investors, the people said.

IDG declined to comment while Monolith did not respond to a request for comment.

AI hardwarestartupopenai

OpenAI makes its next hardware move with Opal Electronics

OpenAI is leading a funding round for Opal Electronics, known for high-end webcams, to develop new AI-native devices for creative work, aligning with Sam Altman's "ambient computing" vision.

TestingCatalog

Summary

What: OpenAI is investing in San Francisco startup Opal Electronics, which plans to launch a new line of AI-native devices beyond its current C1 and Tadpole webcams. This move supports OpenAI's broader strategy into hardware, even as its own Jony Ive-designed ambient computing project faces delays until 2027.

Why it matters: OpenAI's investment in Opal Electronics suggests a strategic pivot to explore alternative hardware form factors and accelerate user feedback on AI integration, while its more ambitious, screenless ambient computing device with Jony Ive navigates significant development challenges.

Decoder

Ambient computing: A vision for technology where computing is seamlessly integrated into the environment, becoming virtually invisible and always available without explicit user interaction, often through lightweight, screenless devices.

Original Article

OpenAI is investing fresh capital into hardware by leading a new funding round for Opal Electronics, a San Francisco startup renowned for its high-end webcams. Known for creating the C1 and the pocket-sized Tadpole, Opal is reportedly preparing to launch a new product line in the coming months. This move will extend Opal's reach beyond cameras and into AI-native devices designed for creative work.

the table. an update on opal electronics. https://t.co/91eGxO5qV6
— Opal (@opalelectronics) June 2, 2026

The specifics of this new device remain unconfirmed. However, Opal’s background suggests it will be vision- and capture-oriented, potentially utilizing OpenAI’s image, video, and real-time voice models as its foundation. Integrating voice into a physical product would provide OpenAI with valuable insights into how users interact with an always-listening companion, offering data that a chat window cannot supply. Details such as form factor, pricing, and exact capabilities are still under wraps.

This investment aligns with a broader trend. Over the past year, OpenAI has been focusing on hardware development, inspired by Sam Altman’s vision for “ambient computing.” This concept involves lightweight gadgets that can sense the world in real time without relying on a screen. OpenAI's most notable project in this area, a palm-sized, screenless device developed with Jony Ive following the multibillion-dollar io acquisition, has faced delays. Originally slated for release, it has now been pushed to 2027 due to software, privacy, and computational challenges, and has lost its original name due to a trademark dispute. Despite these setbacks, Chris Lehane has maintained that devices remain a top priority for the company through 2026.

👀 Interesting... Google AI Studio product lead quoting https://t.co/AKoFUgA9O6 website, who just announced funding from @openai https://t.co/0gIK48vVJf pic.twitter.com/3gsbweqon4
— 🚨 AI News | TestingCatalog (@testingcatalog) June 3, 2026

In this context, the partnership with Opal appears to be more than a standalone venture; it is a strategic move to explore form factors, expedite product release, and gather user feedback while the flagship project continues to develop. For a company intent on moving beyond the smartphone, investing in the hardware that will eventually replace it is a logical progression, making this development one to watch as the first products begin to emerge.

Source

AI researchllmmachine-learning

Sleep for Continual Learning

Google researchers propose a "Sleep" paradigm for large language models, allowing them to consolidate short-term memories into long-term knowledge and self-improve through "Dreaming."

arXiv

Summary

What: Google researchers Ali Behrouz, Farnoosh Hashemi, and Vahab Mirrokni introduced a "Sleep" paradigm in an arXiv paper (2606.03979, submitted June 2, 2026) for LLMs to continually learn. It involves "Memory Consolidation" (Knowledge Seeding via distillation) and "Dreaming" (reinforcement learning to generate synthetic curricula).

Why it matters: This research explores bio-inspired mechanisms for AI, suggesting a path for LLMs to overcome catastrophic forgetting and achieve more robust, autonomous continual learning, potentially reducing the need for constant retraining on new data.

Deep Dive

Researchers from Google, Ali Behrouz, Farnoosh Hashemi, and Vahab Mirrokni, published a paper titled "Language Models Need Sleep: Learning to Self-Modify and Consolidate Memories" on arXiv (2606.03979) on June 2, 2026.
The paper introduces a "Sleep" paradigm for Large Language Models (LLMs) to enable continual learning, drawing inspiration from human learning processes.
The "Sleep" process has two main stages: Memory Consolidation and Dreaming.
Memory Consolidation involves "Knowledge Seeding," an upward distillation process where memories from a smaller network (smaller-self) are transferred to a larger network.
This distillation uses a new Generalized Distillation process, combining on-policy distillation with Reinforcement Learning (RL)-based imitation learning.
Dreaming is a self-improvement phase where the model uses RL to generate a curriculum of synthetic data.
This synthetic data is used to rehearse new knowledge and refine existing capabilities without human supervision.
Experiments showed the importance of the sleep stage for long-horizon, continual learning, knowledge incorporation, and few-shot generalization tasks.
A version of this work has been publicly available since September 2025 on OpenReview.

Decoder

Distillation: In machine learning, a technique where a smaller, simpler "student" model is trained to mimic the behavior of a larger, more complex "teacher" model, often to compress knowledge or improve efficiency.
In-context learning: The ability of a large language model to learn new tasks or adapt to new information provided within the input prompt, without requiring retraining or fine-tuning of its parameters.
Catastrophic forgetting: A phenomenon in neural networks where learning new information causes the model to rapidly forget previously learned information.
Reinforcement Learning (RL): An area of machine learning where an agent learns to make decisions by performing actions in an environment and receiving rewards or penalties.

Original Article

The past few decades have witnessed significant advances in the design of machine learning algorithms, from early studies on task-specific shallow models to more general deep Large Language Models (LLMs). Despite showing promising results in tasks that require instant prediction or in-context learning, existing models lack the ability to continually learn and effectively transfer their temporal in-context knowledge to their long-term parameters. Inspired by human learning process, we introduce a ''Sleep'' paradigm that allows the models to continually learn, distill their short-term fragile memories into stable long-term knowledge with replay, and recursively improve themselves with ''Dreaming'' process. In more detail, sleep consists of two stages: (1) Memory Consolidation: an upward distillation process, called Knowledge Seeding, where the memories of a smaller-self are distilled into a larger network to provide more capacity while preserving the knowledge. As a proof of concept, we present a new Generalized Distillation process for {Knowledge Seeding} (i.e., the combination of on-policy distillation with Reinforcement Learning (RL)-based imitation learning); (2) Dreaming: a self-improvement phase, where the model uses RL to generate a curriculum of synthetic data to rehearse new knowledge and refine existing capabilities without human supervision. Our experiments on long-horizon, continual learning, knowledge incorporation, and few-shot generalization tasks support the importance of the sleep stage.

Submission history

[v1] Tue, 2 Jun 2026 17:56:55 UTC (2,961 KB)

Access Paper:

View PDF
HTML (experimental)
TeX Source

Current browse context:

cs.LG

References & Citations

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

AI startupresearchhardwarellm

Inside Meta's attempts to play catch-up with AI

Mark Zuckerberg installed Alexandr Wang to overhaul Meta's AI efforts, leading to the release of Muse Spark and ambitious plans for "personal superintelligence," despite internal skepticism and a "rough start."

Ars Technica

Summary

What: Mark Zuckerberg recruited Alexandr Wang, co-founder of Scale AI, nearly a year ago to lead Meta's AI revival, investing $15 billion in Scale AI. Wang's secretive TBD Lab (around 100 researchers) recently launched Muse Spark, Meta's "most credible AI model yet," on April 3, 2026, focusing on visual understanding and future coding capabilities.

Why it matters: Meta's aggressive investment and restructuring under Wang highlight the intense race among tech giants for AI dominance, particularly in foundational models. It also reveals the challenges of integrating startup culture and rapid innovation within a large, established tech company, and the internal tensions arising from such a shift.

Takeaway: If you work on Meta's AI teams outside of TBD Lab, be aware of potential internal political shifts and an increased focus on proprietary models over traditional open-source approaches.

Deep Dive

Mark Zuckerberg appointed Alexandr Wang, co-founder of Scale AI, to lead Meta's AI efforts nearly a year ago, investing $15 billion in Scale AI as part of this push.
Wang established the secretive TBD Lab, comprising about 100 elite researchers working in a restricted area of Meta's Menlo Park headquarters.
In April 2026, TBD Lab released Muse Spark, described as Meta's "most credible AI model yet," which has been praised for visual understanding but lags rivals in coding.
Future models from TBD Lab are expected to focus on coding, agentic tasks, and advanced multimodal capabilities, including video generation.
Wang's leadership, while seen by proponents as effective in accelerating research, has drawn criticism internally for being frenetic and overstating incremental progress.
There are internal tensions, with some employees feeling the contributions of pre-existing AI infrastructure and Llama 4 team were not fully acknowledged by Wang.
Wang has prioritized advancing models for "personal superintelligence" and has advocated for greater emphasis on proprietary models, potentially shifting Meta's long-standing open-source approach.
Meta is spending tens of billions on AI, with Muse Spark and future TBD models expected to improve content/ad targeting and underpin AI assistants, business agents, and avatars.
The company faced internal backlash and rolled back parts of a plan to use tracking software that would capture employee computer usage for AI training.
Muse Spark's external access has been limited to a private API, making it difficult for outsiders to assess broadly.

Decoder

Personal superintelligence: A concept, championed by Mark Zuckerberg and Alexandr Wang, envisioning an AI that is profoundly intelligent and personalized to each user, capable of assisting across many aspects of their digital and real-world lives.
Multimodal capabilities: The ability of an AI model to process and understand information from multiple types of data, such as text, images, audio, and video.
Agentic tasks: Tasks that involve an AI model autonomously planning, executing, and monitoring a series of actions to achieve a goal, often interacting with various tools or systems.

Original Article

A year after Mark Zuckerberg installed Alexandr Wang to jolt Meta’s artificial intelligence efforts into wartime mode, the $1.5 trillion company has produced Muse Spark, its most credible AI model yet.

By handing responsibility for Meta’s AI revival to a then-28-year-old start-up founder rather than a veteran researcher, Zuckerberg bet that an outsider’s urgency and ambition could succeed where the company’s established AI organization had struggled.

According to interviews with current and former Meta employees, and associates of Wang, the billionaire wunderkind has now begun to eke out results, while navigating criticism over his experience, early research challenges, and the esoteric internal politics of working at a Big Tech behemoth.

In nearly 12 months, Wang has assembled an elite research group on multimillion-dollar salaries, reshaped parts of Meta’s AI operation, and emerged as one of the most influential executives inside the company—the only Meta leader alongside Zuckerberg to attend a White House dinner with top Silicon Valley figures last year hosted by President Donald Trump.

In April, Meta also released Muse Spark, the first major model to emerge from Wang’s secretive research group, known as TBD Lab.

Wang’s proponents view the release of the model as the clearest sign yet that Meta’s AI rebuilding effort is gaining traction and are confident that successor models—expected to launch in the coming months—could further close the gap with OpenAI, Google, and Anthropic.

“The amount of work the TBD Lab was able to do in a short amount of time is very impressive,” said Russ Salakhutdinov, a computer science professor at Carnegie Mellon University and Meta’s former vice president of AI research. “Alex knows what he doesn’t know and he’s willing to listen.”

Others inside Meta are far less convinced. Critics describe Wang’s leadership as frenetic, arguing he has overplayed what is more incremental progress. Some current and former employees are skeptical that Meta can gain a leading position in frontier AI under Wang.

“The TBD folks, Alex and Zuck too, set a pretty low bar for Muse Spark internally and externally,” said one former Meta AI employee. “The other labs are moving fast.”

Meta said: “Alex’s record speaks for itself: In less than a year, he’s helped build one of the strongest research teams in the industry and led Meta Superintelligence Labs as it launched Muse Spark and established the scientific and technical foundations to scale even more advanced models. We’re excited for everyone to see what they do next.”

Meta is spending tens of billions of dollars on AI, with investors demanding evidence the outlays will translate into revenue. Muse Spark, and future TBD models, are expected to improve Meta’s content and advertising targeting machines, and also underpin initiatives ranging from AI assistants and business agents to digital avatars and wearables.

Wang was recruited after Meta’s AI efforts suffered a series of setbacks last year, culminating in the disappointing reception to the Llama 4 model and growing concern inside the company that rivals were pulling further ahead.

Zuckerberg responded by investing $15 billion into Wang’s data-labeling startup Scale AI and hired its co-founder.

Scale AI had worked closely with leading AI labs, with Zuckerberg believing that Wang’s network and operational intensity could help rebuild Meta’s research organization.

Granted unusual autonomy and secrecy, Wang quickly assembled TBD Lab, a handpicked group of about 100 researchers working from a secure area of Meta’s Menlo Park headquarters that requires special badges to enter, according to people familiar with the operation.

Both Wang and Zuckerberg have offices inside the work area, while non-TBD staff have occasionally been caught trying to sneak in.

Early on, TBD encountered some teething problems, according to multiple people familiar with the matter. Some staff were poached by rivals, including Ruoming Pang, a former Apple executive, who left after just seven months to OpenAI.

Certain research efforts, including initiatives to develop an entirely new codebase for training models, have faced challenges, several people said.

In the end, Muse Spark was built using some elements of Meta’s pre-existing AI infrastructure, including code and datasets associated with Llama 4, according to people familiar with the project.

Subsequent comments by Wang suggesting Muse Spark had been developed “from scratch” irritated some who felt the contributions of the Llama team were not acknowledged, in a sign of deepening tensions between the company’s established AI teams and the TBD lab.

With the TBD team in place, Wang has sought to establish a roadmap that combines his and Zuckerberg’s vision for “personal superintelligence” with the convictions of individual researchers and the practical realities of scaling the infrastructure needed to train future generations of models, according to people familiar with his thinking.

He has also reshaped Meta’s AI safety work with a new team known internally as TBA, or “To Be Aligned.”

In leadership discussions with executives, including Zuckerberg, Wang has prioritized advancing the models while some other leaders have been more concerned with quickly rolling out AI products, according to people familiar with the conversations.

During internal presentations to the AI team known as “Vibe Checks,” Wang espouses an idealistic push towards developing AI so smart that it might solve the world’s problems, at odds with the focus of others on social media applications, one insider said.

Several people said Wang had also advocated placing greater emphasis on proprietary models over Meta’s longstanding open source approach.

Wang has tried to build support for his vision by cultivating a non-hierarchical start-up culture inside TBD. On a recent podcast, he argued that “the very small team where everyone is ‘cracked’ is always going to move faster than the large org where responsibility is distributed,” using gamer slang to describe highly talented engineers.

He also hosts regular boba tea-fuelled happy hours to foster camaraderie inside the secretive group, according to insiders.

Meta’s broader workforce has experienced a less convivial period. Wang’s first year has coincided with restructurings and rounds of layoffs across the company, seeking to offset the cost of its AI spending spree.

Some employees have also protested company plans to install tracking software that would capture their computer usage in order to train AI models. Meta on Tuesday told staff in a memo, seen by the FT, that it would roll back parts of the plan following the backlash.

Muse Spark has also been deployed primarily inside Meta’s own products, making it difficult for outsiders to assess. Wang had indicated that some external companies would receive access through a private API, but that rollout has been limited.

The model was trained using some third-party open-source models, including Chinese ones. Some insiders have compared aspects of the system with DeepSeek’s latest model, although the extent of any similarities remains disputed.

Muse Spark has been praised for visual understanding, but Wang has acknowledged it trails rivals in coding. Several employees said staff asked to test the model for software development tasks continued to prefer Anthropic’s Claude.

Future Meta models are expected to focus on coding, completing agentic tasks, and more advanced multimodal capabilities, including video generation.

“It was a rough start for him to find his power at the company,” said one associate. “But he’s found his groove.”

AI enterprisemobileweb

Be There for Every Customer With Meta Business Agent

Meta has launched Meta Business Agent, an AI tool for businesses to manage customer interactions and sales on WhatsApp, Messenger, and Instagram, expanding globally to all sizes.

Summary

What: Meta introduced Meta Business Agent on June 3, 2026, an AI tool that allows businesses to automate customer service, recommendations, appointments, and sales across WhatsApp, Messenger, and Instagram. Over one million businesses already use it, and it's expanding globally with future paid subscription offerings.

Why it matters: This move signifies Meta's deep integration of AI into its core business communication platforms, aiming to capture a larger share of the enterprise market by offering scalable, automated customer engagement solutions. It positions Meta's messaging apps as critical AI-powered conduits for business operations.

Takeaway: If your business uses Meta's messaging platforms, consider exploring the free Meta Business Agent to automate customer interactions and manage sales, especially if you anticipate high volume.

Deep Dive

Meta launched the Meta Business Agent on June 3, 2026, an AI tool designed for businesses to manage customer interactions efficiently across WhatsApp, Messenger, and Instagram.
Over one million businesses are already using a Meta Business Agent on WhatsApp and Messenger, handling over one billion active threads with businesses daily.
The agent can be set up in minutes or integrated into existing enterprise infrastructure, enabling businesses to scale customer output significantly.
Key capabilities include answering business-specific questions, providing product recommendations from a catalog, booking appointments, qualifying leads, and closing sales.
Meta is expanding the Business Agent globally to businesses of all sizes, with initial access being free, and future paid subscription offerings to come.
The Meta Business Agent Platform is also introduced, providing infrastructure for larger businesses to build, customize, and deploy agents at scale, connecting to systems like Shopify and Zendesk.
The platform includes enterprise-grade controls, guardrails, and measurement tools to define rules and offer personalized experiences within messaging apps.
Meta is also making it easier for customers to discover businesses powered by the Business Agent directly on WhatsApp through search or sharing contact information.
Future capabilities are planned to expand beyond customer service to fully run daily operations, including market research, product insights, calendar management, and competitive intelligence.

Original Article

Today, we’re introducing Meta Business Agent – AI that lets every business show up for every customer as if they had an infinite team behind them. Business Agent can be set up in minutes or plugged directly into your existing enterprise infrastructure so you can 10X or 100X output.

More than one million businesses are already using a Meta Business Agent on WhatsApp and Messenger to respond to customers around the clock. And because there are more than one billion active threads with businesses on WhatsApp, Messenger and Instagram every day, a Business Agent can deliver more relevant, personalized experiences from the very first interaction. (Updated on June 3, 2026 at 1:40PM PT to reflect the latest business conversation stats)

We’re now expanding our Business Agent to businesses of all sizes globally, so you can have yours up and running within minutes, responding in your customers’ local languages using your tone. Your Business Agent can:

Answer questions specific to your business
Make product recommendations from a business catalog
Book appointments and qualify incoming leads
Let you decide when a team member steps in to provide support
Close sales

We’re also expanding Business Agent to Instagram. Businesses can activate their Business Agent here, and getting started is free. In the coming months, businesses will access the agent through paid subscription offerings, with options for businesses of every size.

An AI Agent That Works for You

Because the Meta Business Agent responds to customers, it doubles as a partner that can deliver a morning briefing to catch you up on chats you missed overnight and provide insights on your threads. We’re starting with a select number of businesses on the WhatsApp Business app, Instagram Pro, Messenger, and Meta Business Suite, and in the future we’ll expand its capabilities to help fully run all your daily operations — like conducting market research, surfacing product insights, connecting with the tools to manage your calendar and providing competitive intelligence. You can join the waitlist here.

Personalizing Customer Experiences

We’re also introducing the Meta Business Agent Platform: a new agentic platform that gives businesses the infrastructure to build, customize, and deploy their Business Agent at scale. It lets businesses connect to a growing suite of hundreds of systems like Shopify, Zendesk, and Shopee giving Business Agents the ability to take action on behalf of the business. The platform provides larger businesses with enterprise-grade controls, guardrails, and measurement built in so they can define rules and offer personalized experiences, starting within the messaging apps their customers already use.

For businesses using WhatsApp, the Meta Business Agent Platform works alongside our Business Platform, and we support Messenger and Instagram as well.

Helping Businesses Get Discovered by New Customers

We’re also making it simpler for people to discover businesses powered by a Meta Business Agent directly on WhatsApp. Soon, people will be able to find businesses by typing its name in the Search bar, or by sharing its phone number or contact card in chats with friends and family. This way, when more customers reach out they get a quick, helpful response.

We’re excited to hear how the Meta Business Agent can give businesses the support they need to succeed — no matter their size.

Tech aienterprisesocial-media

Mark Zuckerberg Wants Meta's New AI Agents to Run Your Whole Business

Mark Zuckerberg's Meta has launched a free AI agent on WhatsApp, Instagram, and Messenger to help businesses automate customer interactions, with plans to expand its capabilities to manage entire operations.

The Wall Street Journal

Summary

What: Meta has introduced a free AI agent for businesses across its platforms (WhatsApp, Instagram, Messenger) that can answer customer questions, book appointments, and close sales, with future plans for full business management.

Why it matters: This indicates Meta's strategy to deeply embed AI into its popular messaging services, transforming them into comprehensive business operating platforms and creating a new subscription revenue stream for AI tools.

Takeaway: If you run a small business using WhatsApp, Instagram, or Messenger, explore Meta's new free AI agent for automating customer service and sales tasks.

Original Article

Meta has launched an AI agent for businesses on WhatsApp, Instagram, and Messenger. The agent can answer customer questions, book appointments, close sales, and perform other functions. Meta plans to expand its capabilities so that it can eventually help run whole businesses. The agent is free to use for now, but Meta will eventually shift it to being a paid subscription service with different tiers.

Tech hardwarespacelaunchengineering

Blue Origin vows to resume New Glenn flights by year's end

Blue Origin CEO Dave Limp vows to resume New Glenn rocket flights by the end of 2026, despite a "spectacular" launch pad explosion last week that destroyed the rocket and its transporter-erector.

Spaceflight Now

Summary

What: Blue Origin CEO Dave Limp stated on X that New Glenn rocket launches will resume by the end of 2026, confirming that damage from last week's explosion at Cape Canaveral Space Force Station's launch pad 36 was less severe than feared, with propellant tanks and the processing hangar intact.

Why it matters: The rapid commitment to resume flights suggests the root cause of the explosion during a BE-4 engine test firing was likely not a fundamental design flaw, which is good news for United Launch Alliance, which also uses BE-4 engines in its Vulcan rocket.

Deep Dive

Blue Origin's New Glenn rocket exploded during a first stage engine hot-fire test at Cape Canaveral Space Force Station's launch pad 36 last Thursday.
The explosion destroyed the New Glenn rocket and its transporter-erector.
CEO Dave Limp announced on X that the damage to launch infrastructure (propellant tanks, processing hangar) was not as bad as initially thought, and can be repaired in place.
Another New Glenn first stage booster and three upper stages were safe in a nearby integration facility.
Blue Origin plans to replace its transporter-erector with an alternative vertical rocket assembly capability.
The company's motto, "Gradatim Ferociter," means "step by step, ferociously."
The New Glenn is crucial for NASA's Artemis moon program, which relies on Blue Origin (and SpaceX) for lunar landers.
NASA Administrator Jared Isaacman and Kennedy Space Center Director Brian Hughes remain optimistic about the 2028 lunar landing goal.
The BE-4 engines, also used by United Launch Alliance (ULA) for its Vulcan rocket, are not yet blamed for the mishap.

Decoder

Transporter-erector: A large vehicle used to transport a rocket to its launch pad and then raise it into a vertical launch position.
Hot-fire test: A test where a rocket's engines are ignited and run while the rocket remains securely bolted to the launch pad, to verify propulsion systems and procedures.

Original Article

Despite a spectacular launch pad explosion last week, Jeff Bezos’s rocket company Blue Origin said Tuesday the damage was not as severe as initially feared and that the company plans to resume New Glenn rocket launches by the end of the year.

Blue Origin CEO Dave Limp, in an overnight post on the social media platform X, said propellant tanks at launch pad 36 at the Cape Canaveral Space Force Station made it through the blast in good shape, as did a nearby processing hangar. The main support gantry, while damaged, can be repaired in place.

“Now that we’ve had access to the pad and integration facility we can share a bit of good news,” Limp said. “The propellant farm, oxygen, liquid hydrogen and LNG [cryogenic methane] tanks are all in good shape. This is good luck because these are very long lead items.

“The water tower is also good. The big support tower is damaged, but it can be repaired in place rather than torn down and replaced.”

The New Glenn rocket that blew up on pad 36 last Thursday was destroyed along with its transporter-erector, used to move the rocket to the pad surface and then rotate it to vertical. But Limp said another New Glen first stage booster and three upper stages housed in a large hangar-like “integration facility” at the base of the pad “look good.”

‘We had already been working for some time on eliminating our transporter-erector in favor of an alternative vertical (rocket assembly capability), and we’ll now go directly to that; so we don’t need a new transporter-erector.”

No word yet on what might have caused the explosion, but Limp closed his post by declaring: “We will fly again before the end of this year. Gradatim Ferociter.” The Latin expression, Blue Origin’s motto, means “step by step, ferociously.”

Blue Origin was preparing to launch it’s third New Glenn later this month to put a batch of Amazon Leo internet satellites into orbit. Last Thursday, engineers loaded both stages with supercold liquid methane and oxygen for a first stage engine test firing to verify its readiness for flight. The Leo satellites were not aboard.

Such “hot-fire” tests are fairly routine in the rocket industry, giving engineers a chance to test launch-day fueling procedures, a booster’s propulsion system and critical ground and flight software while the rocket remains securely bolted to its launch pad.

But it was far from routine last Thursday.

As the New Glenn’s seven BE-4 engines began igniting and throttling up, a fire broke out at the base of the booster and moments later, now engulfed in flames, the rocket exploded in a tremendous fireball, shaking the ground for miles around in a conflagration visible all the way across the Florida peninsula.

Footage captured by photo journalists from a helicopter the next day showed the rocket and its transporter-erector had been destroyed, at least some support beams at the base of the main gantry were either bent or blown away and a separate lightning tower had collapsed in a tangle of debris.

Unlike rival SpaceX, which has two operation pads in Florida and one in California, Blue Origin only has pad 36. The company already had plans to build a second pad at the Cape and another at Vandenberg Space Force Base in California. But in the near term, New Glenns cannot fly until pad 36 is repaired.

That’s a problem for NASA’s Artemis moon program and the agency’s drive to beat the Chinese to the lunar surface. Chinese officials have said they plan to land their own “taikonauts” on the moon by the end of the decade.

To win this self-declared “space race,” NASA is relying on both SpaceX and Blue Origin to launch new moon landers into Earth orbit next yet for rendezvous and docking tests with Artemis astronauts in an Orion capsule.

If those tests go well, NASA hopes to launch one, and possibly two, astronaut moon landing missions in 2028, soon followed by two flights per year thereafter before beginning assembly of a moon base near the lunar south pole where astronauts can live and work for months at a time.

Blue Origin’s lander would give NASA an alternative to SpaceX’s, a variant of the company’s Starship rocket. SpaceX has had its own problems perfecting the Super Heavy-Starship rocket needed to launch its lander, and it’s not yet clear if they will be ready for the Artemis III Earth-orbit test flight next year as currently planned.

Blue Origin’s New Glenn also is needed to launch prototype rovers and other science experiments to the moon aboard an unpiloted cargo lander under contracts announced two days before last week’s explosion.

NASA Administrator Jared Isaacman remains optimistic about landing Artemis astronauts on the moon in 2028 using whatever landing craft is available.

“Blue Origin leadership has responded incredibly quickly, and NASA will do all we can to help with root cause analysis and accelerate pad recovery timeframes while staying extremely focused on progressing the lander,” he said on X.

Kennedy Space Center Director Brian Hughes, appointed to the post just last month, told the Space Florida board of directors Tuesday that NASA is “doubling down on the lunar lander.”

“We’ll be working with Blue and X lunar lander technology, and all of that is designed to keep us on path, meet the President’s goal, which is to have American boots back on the moon before the end of 2028,” he said. “Again, that’s not just something to tout, it’s an important demonstration of our nation’s abilities.”

Limp’s vow to resume flights by the end of the year might imply the “root cause” of the explosion might not have been an engine problem that would take months to correct and then test. Or at least, not a major design flaw.

That would be good news for United Launch Alliance, a partnership between Boeing and Lockheed Martin. ULA uses Blue Origin’s BE-4 engines in the first stage of its new Vulcan rocket. A drawn-out engine failure investigation would be a setback for ULA, but the BE-4s have not yet been blamed for the New Glenn mishap.

Tech airoboticsdatachinaresearch

China is training a robot future — one folded shirt at a time

China is gaining a scaling edge in robotics development by mobilizing large local workforces to collect massive, low-cost training data sets from real homes and factories.

Rest of World

Summary

What: Chinese tech companies like JD.com are rapidly collecting millions of hours of human movement data in homes and factories, paying residents and workers in cities like Suqian and Guangdong to film themselves performing chores and tasks, a strategy enabled by low labor costs and government support.

Why it matters: This localized, high-volume data collection model contrasts with the U.S. approach of outsourcing, potentially giving China an advantage in building robots better adapted to diverse real-world environments, accelerating physical AI development.

Deep Dive

Robotics development globally is bottlenecked by a shortage of complex visual and movement training data.
Traditional teleoperation for robot training is costly and limited in preparing robots for diverse real-world environments.
Chinese companies are using creative, low-cost methods to collect data in realistic settings, including egocentric (first-person) videos.
JD.com aims to generate 10 million hours of data over two years by recruiting 100,000 employees and 500,000 external workers in a "data collection neighborhood" in Suqian.
Workers in elderly care centers and kiwifruit farms are wearing head-mounted cameras.
Data vendors in Guangdong are working with electronics and packaging factories to collect assembly-line data using head cameras and wrist sensors.
This approach provides new job opportunities for people like Gao Bo, a stay-at-home mom earning $3/hour by filming chores.
Analysts like Marco Wang from Interact Analysis believe China leads in hardware and data ecosystems for robotics, while the U.S. leads in AI talent and model research.
Professor Alan Fern of Oregon State University notes that while the scaling logic from large language models is applied, it's still unproven whether this data will lead to truly intelligent robots capable of functioning in arbitrary environments.

Decoder

Egocentric data: First-person video data, typically recorded from head-mounted cameras, showing a user's perspective and hand movements while performing tasks.
Teleoperation: The control of a robot or machine from a distance by a human operator.

Original Article

China's low labor costs, government support, and public enthusiasm for robotics development are helping the industry mobilize large populations to work in robotics data collection. Robotics developers globally have ramped up data collection in real households since the beginning of the year. US companies face high labor costs and are outsourcing their data collection to workers in developing countries. Chinese companies are able to collect massive data sets locally, helping them build robots better adapted to domestic environments.

Tech devopswindowsopensourcecli

Microsoft brings coreutils to Windows

Microsoft has released coreutils for Windows, a set of Rust-based Unix-style command-line utilities that run natively on Windows, available through WinGet, aiming to ease cross-platform scripting.

OSNews

Summary

What: Microsoft announced coreutils for Windows at its Build conference, offering a Microsoft-maintained port of Rust-based GNU coreutils, findutils, and grep, providing familiar Unix-style commands like `cat.exe` and `grep.exe` natively on Windows.

Why it matters: This move aims to reduce friction for developers moving between Linux, macOS, WSL, and Windows by standardizing command-line tools, reflecting Microsoft's continued embrace of open-source and developer-centric features.

Takeaway: If you frequently use Unix command-line utilities on other platforms, install coreutils for Windows via WinGet to streamline your workflow on Windows.

Deep Dive

coreutils for Windows provides a set of UNIX-style command-line utilities (e.g., cat, grep, find) that run natively on Windows.
It is a port of the Rust-based rewrite of GNU coreutils, findutils, and grep (known as uutils).
The utilities ship as a single multi-call binary, exposing each utility under its standard name (e.g., cat.exe).
The goal is to make existing scripts and habits transfer seamlessly between Linux, macOS, WSL, containers, and Windows.
Each command supports the standard --help flag for syntax and options.
Caveats include handling Windows path separators (it handles both but output may vary), differences in ACLs vs. POSIX permissions, and the absence of /dev/null.
Some commands relying on POSIX-only concepts or deemed not useful on Windows are excluded.
Conflicts with built-in cmd.exe and PowerShell commands mean command execution depends on shell, PATH order, and PowerShell's alias table.
The project is in preview and installable through WinGet.

Decoder

coreutils: A package of fundamental command-line utilities typically found on Unix-like operating systems (e.g., ls, cp, mv, grep, cat, find).
WinGet: The official package manager for Windows, used to install applications and tools from the command line.

Original Article

At its Build conference, Microsoft announced coreutils for Windows.

Coreutils for Windows is a Microsoft-maintained set of UNIX-style command-line utilities that run natively on Windows — the same commands and pipelines you use on Linux, macOS, and WSL. It ships as a single multi-call binary that exposes each utility under its standard name (cat.exe, grep.exe, find.exe, and so on), giving you the everyday tools developers already use on other platforms to script, automate, and process text. For the full list, see Commands.

The goal is to remove friction when moving between Linux, macOS, WSL, containers, and Windows. The same commands, flags, and pipelines work the same way, so existing scripts and habits carry over without translation. Each command supports the standard --help flag for full syntax and options.

It’s a port of the Rust-based rewrite of the GNU coreutils, findutils, and grep. There are a few caveats though, since these ports have to deal with a number of Windows-isms. The first thing that comes to mind for most of us are path separators; these ports will handle both the correct and incorrect Windows/DOS one, but since some tools may output only the incorrect one this may affect piping. You should also take into account things like Windows’ ACLs vs. POSIX permission bits, the lack of /dev/null, and a few other oddities.

Furthermore, there are a bunch of commands that rely on POSIX-only concepts, so those aren’t included, and a few other commands that aren’t useful on Windows are excluded as well. Since a number of commands conflict with built-in commands from cmd.exe and PowerShell, which commands run will depend on the shell, the PATH order, and PowerShell’s alias table.

Everything’s in preview, and installable through WinGet.

About The Author

Thom Holwerda

Follow me on Mastodon @[email protected]

12 Comments

2026-06-03 8:18 pm xeoron
If they want to do things right, just release Windows 12 as a Linux distro and Windows Server, as well with a explorer shell as the UX and wine making old Window apps run on Linux with porting tools in Visual Studio.

2026-06-04 9:24 am franksands
Yeah, I thought this over the years. I think it would benefit everyone. Just go full linux distro and use a VM or something similar to legacy apps
2026-06-04 11:17 am Drumhellar
That is a terrible idea. Porting Windows to the Linux kernel any benefit to Windows, and would break a huge amount of software in the process.

What would be the point?

2026-06-03 11:26 pm Alfman
Looking at the list of commands supported by this project, it’s such a small subset of unix that I think switching between windows and unix command sets would be more frustrating than useful. I’d prefer going all windows or all unix, not blending them together.

Incidentally the Mingw and Cygwin projects did this decades ago and provided a fairly comprehensive unix environment running on windows with tons of unix packages. It was even possible to build & run *nix software on windows. These unix tools are what I used to build native windows software. I even used unix tools to build windows kernel drivers, fun times 🙂

2026-06-04 9:27 am franksands
I had problems when I needed mingw or cygwin to interact with the “outside world”. So native windows versions are always welcome. Or go full linux like xeoron suggested

2026-06-04 12:14 pm Alfman
franksands,

I had problems when I needed mingw or cygwin to interact with the “outside world”. So native windows versions are always welcome. Or go full linux like xeoron suggested

It’s probably too late, but I’d genuinely offer to help if you have a current need.

Cygwin and mingw worked differently. Cygwin was about porting unix APIs to windows. Software built for cygwin was kind of a hybrid unix/windows application. However mingw did not take this approach and actually builds native windows software without trying to make the software think it’s in a unix environment. You didn’t have to use any of the unix APIs, you could (and I did) build pure win32 software. This made mingw my go to choice for building proper native win32 software. I also tried LCC for a time but preferred mingw.

2026-06-04 6:15 pm LeFantome
@Alfman

This is just Microsoft releasing uutils stuff under their own banner. As they are Rust based, they pretty much already worked on Windows already and that is one of the uutils design goals. As uutils adds more of the full Linux suite, MS may add more.

MS lists the utils they dropped. I am not sure why they decided not to include dd.

2026-06-04 2:42 am newbie_sysadmin
As a sysadmin in a Windows-centric company and a Linux fan at home, I think the best shell for Windows administration is PowerShell.
A lot of the great things about unix-style commands simply cannot be ported to Windows because it’s an API-centric system while UNIXes are text-centric.
If you want to make an extremely quick and small script, then it’s faster to search or use an LLM or whatever and write in the native language of your OS rather than install a new tool THEN figure out implementation drift from the system you’re used to seeing it in…

And if you want something solid that will be re-used often you will always have a better experience with the native tool as well, I just don’t get these porting efforts.

Same way that when I moved from Windows to Linux at home I used PowerShell on Linux because bash felt too foreign, and yeah PS on Linux is just NOT that and that’s exactly because PowerShell is really made for an API-centric OS.
2026-06-04 6:35 am Matriks404
That’s odd, but overall probably a good thing for these people who use Linux/macOS/Other Unix* everyday, but need to deal with Windows machines at work, and have no WSL installed (because otherwise you would want probably use actual full-featured GNU core utililities, along with other useful commands standard Linux distributions come pre-installed with).

2026-06-04 6:23 pm LeFantome
> you would want probably use actual full-featured GNU core utililities

uutils are meant to have identical behaviour to the GNU utilities. If they are different, it is a bug. Unless Microsoft has crippled them, these will give you the same functionality as the GNU equivalents.

uutils uses the GNU test suite as a benchmark for functionality. When differences are found that the tests did not catch, uutils contributes new tests to the GNU suite to catch them in the future. You can see the size of the test suite growing as more people start using uutils (eg. in Ubuntu). There tend to be a bunch of new tests created after each uutils release.

https://raw.githubusercontent.com/uutils/coreutils-tracking/refs/heads/main/gnu-results.svg

2026-06-04 6:33 pm LeFantome
Sorry, “new tests created after each GNU coreutils release”.

2026-06-04 6:18 pm LeFantome
So Ubuntu and Windows ship some of the same utilities, both based on uutils. Crazy.

MacBook Neo is So Popular That Apple Reportedly Doubled Production

Apple reportedly doubled its MacBook Neo production target to 10 million units for 2026 due to "off the charts" customer demand for the affordable $599 laptop.

MacRumors

Summary

What: Apple's CEO Tim Cook stated "off the charts" demand for the MacBook Neo, leading supply chain analyst Ming-Chi Kuo to report a doubled production target from 5 million to 10 million units in 2026. The MacBook Neo, starting at $599 ($499 for students) and powered by an iPhone A18 Pro chip, drove record first-time Mac buyers. A second-gen model with an A19 Pro chip and 12GB RAM is expected next year.

Why it matters: This indicates Apple is successfully expanding its market share by offering a more accessible entry point to the Mac ecosystem, leveraging its mobile chip technology to achieve competitive pricing and performance.

Original Article

MacBook Neo is So Popular That Apple Reportedly Doubled Production

On an earnings call in late April, Apple's CEO Tim Cook said that customer response to the MacBook Neo was "off the charts," and the popularity of the laptop has reportedly led the company to significantly boost production.

Apple supply chain analyst Ming-Chi Kuo this week said he believes that MacBook Neo shipments to Apple were doubled from an initial target of 5 million units to 10 million units in 2026 at some point after the laptop launched in March.

Apple was very optimistic about the MacBook Neo before announcing it, but the company still "undercalled" the level of enthusiasm that the laptop would generate, according to Cook. He said that MacBook Neo demand exceeded Apple's expectations and helped to drive a record number of first-time Mac buyers last quarter.

New figures from market research firm IDC support Apple's claim that the MacBook Neo is selling well, and the Windows PC industry has taken notice. For example, Dell recently introduced a redesigned XPS 13 laptop from $699 and said it has features "you won't find on a MacBook Neo," such as a touch screen and a backlit keyboard.

"Apple's MacBook Neo is a capable machine, and its arrival confirms that there's real appetite for premium quality at accessible prices," admitted Dell.

With a starting price of $599 in the U.S., or $499 for college students, the MacBook Neo is Apple's most affordable MacBook ever. Powered by the iPhone's A18 Pro chip, the laptop is available in colorful finishes like Citrus and Blush.

A second-generation MacBook Neo is expected to be released next year with an A19 Pro chip and 12GB of RAM.

Tech devopsinfrastructuresecuritynetworking

DNS Is for People - Not for IT Infrastructure

An article argues that DNS, while essential for public services, should largely be avoided for internal IT infrastructure to boost reliability, robustness, and security.

Louwrentius

Summary

What: The author contends that DNS introduces unnecessary complexity and potential failure points for machine-to-machine communication, citing high-profile incidents like the Meta outage. They suggest directly injecting IP addresses into configuration files or using `/etc/hosts` for internal services, and warn of DNS as an egress exfiltration risk if public DNS queries are allowed from internal systems.

Why it matters: This perspective challenges a common industry practice, advocating for a more minimalist and direct approach to internal networking configuration, highlighting how seemingly minor dependencies can have catastrophic "blast radii" in complex systems.

Takeaway: Review your internal IT infrastructure for critical services that rely on DNS and consider if direct IP addressing or `/etc/hosts` configurations could improve reliability and reduce attack surface.

Deep Dive

The author argues that DNS is primarily for human-readable domain names for public services and is often a liability in internal IT infrastructure.* DNS adds complexity and potential failure points, increasing the risk of outages due to unforeseen interactions or circular dependencies, as seen in incidents like the Facebook/Meta outage.* For machine-to-machine communication, DNS caching behavior (TTL) can lead to stale records and unreliable updates, requiring manual intervention.* The article suggests directly injecting IP addresses into configuration files using automation tools like Ansible or pyinfra.* Alternatively, using /etc/hosts files, which can also be provisioned at scale, offers human-readable names without relying on a DNS service.* DNS is highlighted as a generic security risk, as default DNS traffic is unencrypted, making it vulnerable to spoofing and other attacks if not secured with DNSSEC (which adds complexity).* DNS also presents an egress exfiltration risk: attackers can use DNS queries to an attacker-controlled nameserver to exfiltrate sensitive data even when direct outbound connections are filtered.* To mitigate egress risk, systems should be prevented from querying public DNS records, with external access facilitated through a proxy with allow-listed domains.* The overall recommendation is to explore avoiding DNS for internal IT infrastructure to enhance reliability, robustness, and security, balancing benefits and drawbacks with specific infrastructure context and risk appetite.

Decoder

Egress exfiltration: The unauthorized transfer of data out of a network, often by leveraging commonly allowed outbound network protocols like DNS to bypass security controls.* DNSSEC: Domain Name System Security Extensions, a suite of IETF specifications for securing data exchanged in the Domain Name System by cryptographically signing records.* TTL (Time-To-Live): A setting in a DNS record that tells a DNS resolver how long to cache the record before querying for a new one, impacting how quickly changes propagate.* /etc/hosts: A plain text file in Unix-like operating systems that maps hostnames to IP addresses, acting as a local DNS resolver for specific entries without querying external DNS servers.

Original Article

The Domain Name System exists because it's difficult for people to remember IP addresses (185.15.59.224) and much easier to remember domain names (wikipedia.org).

Regarding internet-accessible services, it makes sense to publish websites, API endpoints or similar services using DNS, as people have to interfact with them. The added benefit of a domain name is that the associated IP address can change without the client being affected.

This article isn't against DNS for public services, but it questions if we should use DNS for internal IT infrastructure (independent of cloud vs. onprem)

It's always DNS

Although DNS can be a very beneficial service, it can also become a liability. If you want a reliable system, you want as little components as possible. Every additional component adds a potential risk of failure. In addition, more components may create unforeseen behaviour and interactions that can cause outages (circular dependancies, and so on). If you can avoid adding components, you'll have a better chance of building a reliable system.

Within the IT operations space, DNS has made a bit of a name for itself. Many may remember this little haiku.

It’s not DNS
There’s no way it’s DNS
It was DNS

(source)

There are multiple(1) high-profile(2) incidents where DNS was involved. In these linked cases, the root-cause of the incident isn't the DNS system itself. Yet, because the root-cause affects the DNS service - which is in the critical path for virtually all services - the incident has such a huge impact.

The Facebook / Meta outage was so significant because it locked people out of buildings (physical access) due to 'circular' dependancies on DNS being available. Again, it can be said that the circular dependancy is the root-cause, but the blast radius of DNS is in many cases so enormous that it may be difficult to have a clear end-to-end picture of potential risk.

The case against DNS for internal IT infrastructure

From the perspective of IT operations, DNS has a drawback: DNS clients cache DNS records based on TTL. Different DNS client implementations can behave differently, but even if you have a fairly homogenous environment, the only way to assure clients (in this case other servers) use the updated IP address, is to control them and force a DNS refresh.

That got me thinking, why would we use DNS for infrastructure services? It isn't necessary for machine-to-machine communication. Instead of configuring domain names that may not resolve, we can just directly inject the appropriate IP address(ess) into configuration files. It's easy to configure systems with tools like Ansible or pyinfra at scale.

The counter argument could be that DevOPS / platform engineers are also humans, and it's much easier to spot misconfigurations or to troubleshoot if domain names are configured Instead of IP addresses.

Fortunately, we still have /etc/hosts, which we can easily provision. Still no DNS service required! This way, we can configure domain names and pretend to use DNS. I also suspect that DNS queries against /etc/hosts are quite responsive.

DNS as generic security risk

As of today, most network traffic is encrypted by default, or tunneled through an encrypted channel. DNS is - by default - the exception. Regarding internal IT infrastructure (cloud or 'onprem'), the network may be considered as a secure environment. An attack on the DNS service, spoofing packets, and so on, can be very disruptive though. Setting up DNSSEC may alleviate this problem, but that also introduces another administrative burden with it's own risk of misconfiguration. It's yet another layer of complexity. And we assume that internal infrastructure supports DNSSEC.

DNS as an Egress Exfiltration risk

Because egress filtering (filtering of outbound connections) can be cumbersome, it's often omitted, because the systems involved are 'trusted'. This is unfortunate as this makes life easier for an attacker. Any kind of resource required for an attack can be acquired on the vulnerable system with a simple outbound query towards the internet. Proper egress filtering of network traffic can be the difference between a succesfull and unsuccessful hacking attempt.

A lack of egress filtering also makes it much easier for an attacker to exfiltrate data. And the thing is: any IP protocol can be used to exfiltrate data, including DNS1.

This is how: the attacker gets a domain runs their internet-accessible authoritative nameserver for this domain. Now the attacker can make DNS requests to said domain like sensitivedata.evil.domain from the hacked system and you can extract all the data from the rogue DNS server logs2.

Although a hacked server may not be able to directly interact with the attacker-controlled DNS server, by issuing DNS requests for the attacker-controlled domain, these requests will pass the local forwarding DNS server and be forwarded towards the attacker-controlled authoritative DNS server. See also tools like dnscat2 or iodine

Due to this risk, there is a case to be made, to - at least - not allow systems to query public DNS records. As servers may need to interfact with services on the internet (update servers, APIs, and so on), such access can be facilitated by a proxy server using allow-listed domains.

Evaluation and closing words

In the end, everything is a tradeoff, where people must balance benefits and drawbacks against the context of their infrastructure, their particular risk appetite and even organisational structure and culture.

That said, I think it's reasonable to explore if DNS can be avoided altogether within the IT infrastructure to increase reliability and robustness.

Feel free to share your thoughts and feelings about this if you feel so inclined. Maybe leave a comment on the Hacker News thread.

Don't forget about services like NTP or ICMP. ↩
I have demonstrated this attack using this exact method with a domain I own for a customer that thought they had properly prevented egress traffic, including blocking NTP and ICMP. ↩

Tech aiwebpolicysearch

Google offers opt-out of “AI” search results for websites, promises it won't affect regular search rankings

Google is offering website owners an opt-out from having their sites appear in generative AI Search features like "AI Overviews," promising it won't impact regular search rankings, a move mandated by the UK's CMA.

OSNews

Summary

What: Google announced a new toggle in Search Console allowing website owners to opt out of having their content used to "ground responses" in generative AI Search features. This feature, mandated by the UK's Competition and Markets Authority (CMA), will prevent traffic and impressions from AI features but Google claims it won't affect regular search rankings.

Why it matters: This reflects growing regulatory and publisher pressure on tech giants regarding AI's use of web content, highlighting the tension between leveraging public data for AI and respecting content creators' control and monetization.

Takeaway: Website owners concerned about their content appearing in Google's generative AI Search features can now opt out via Search Console.

Decoder

Generative AI Search features: AI-powered search results that synthesize information to provide direct answers or summaries, rather than just linking to sources, such as Google's "AI Overviews" or "AI Mode."* Search Console: A free service from Google that helps website owners monitor their site's performance in Google Search, identify and fix issues, and submit sitemaps.* Competition and Markets Authority (CMA): The principal competition and consumer authority in the United Kingdom, responsible for strengthening business competition and preventing anticompetitive activities.

Original Article

Google is adding a switch to allow website owners to opt out of being featured in their “AI” overviews and related slopsearch results.

With this new toggle in Search Console, website owners can decide if they want their site to appear in and help ground responses in our generative AI Search features (like AI Overviews, AI Mode or AI Overviews in Discover). Sites that opt out will not receive traffic or impressions from our generative AI features. This control will not be used as a ranking signal for search results outside of these generative AI Search features. This work builds on our long history of designing tools, like snippet controls and Google-Extended, that give websites more choice.

While it’s nice of Google to offer such an opt-out to website owners, their claim that opting out won’t effect your regular search ranking rings hollow to me. I simply just do not trust Google in any way, shape, or form to not weaponise their “AI” against anyone who doesn’t want to be sucked up, regurgitated, and spat out in one of their slopsearch tools. On top of that, regular Google Search is dead anyway, so even if they keep their promise, it’s moot because Google users are going to be force-fed the slopsearch tools instead of the regular Google Search.

I honestly have no idea how much traffic OSNews gets from Google at this point, and while I can look it up, I just don’t really care, and think it’s probably not that much. I could opt us out, but the real problem is that such an opt-out won’t stop Google’s slopbots – or anyone else’s slopbots – from taking our writing and training their “AI” tools on it, so what’s the point of going through the effort?

I doubt Google is relevant enough for us.

About The Author

Thom Holwerda

Follow me on Mastodon @[email protected]

4 Comments

2026-06-03 11:45 am luckyleap
One missing thing stands out in this feature description – lack of a statement about scrapping the content for the use in “AI overview” etc. So, you can opt out of receiving traffic from “AI” search results, but you cannot opt out of using your content for benefit of Google. If that’s meant to appease regulators and publishers regarding stealing user traffic from other sites, they missed the point. I doubt this (attempt at) malicious compliance will hold.
2026-06-03 1:20 pm Techokami
And the option only exists for people in the UK…
2026-06-03 1:56 pm flypig
It’s worth noting that this wasn’t something Google chose to do, it was mandated by the UK’s Competition and Markets Authority.

https://www.gov.uk/government/news/cma-secures-fairer-deal-for-publishers-and-improves-google-search-services-in-uk

I find it strange this wasn’t implemented through robots.txt or a meta tag. As it stands, I’d have to create a Google account to opt out, which I’m not going to do. Hopefully DisallowAITraining in robots.txt will have the same effect.
- 2026-06-03 2:46 pm Alfman
  flypig,
  
  I find it strange this wasn’t implemented through robots.txt or a meta tag. As it stands, I’d have to create a Google account to opt out, which I’m not going to do. Hopefully DisallowAITraining in robots.txt will have the same effect.
  
  This is one of the reasons the idea of using a service like cloudflare to block “AI bots” can’t fully solve the issue. Most website owners want their site to be indexed, but the same bots that are scraping a website for indexing can easily use the same data for AI training (and do so without increasing the bot’s footprint). You could technically block them, but everyone gives the biggest bots a pass for SEO.

They're Made Out of Weights

A thought-provoking dialogue, echoing Terry Bisson, humorously posits that advanced AI models are fundamentally just layers of multiplying floating-point numbers, or "weights."

maxleiter.com

Summary

What: The June 3, 2026 article argues that large AI models, even when performing complex tasks like writing a eulogy, derive all their functionality and knowledge from mathematical weights undergoing matrix multiplication. There are no separate modules for language, reasoning, or stored facts; everything emerges from these numbers.

Why it matters: This piece offers a stark and reductionist view of AI's current architecture, challenging common anthropomorphic interpretations of intelligence. It underscores that emergent capabilities arise from fundamental mathematical operations, prompting reflection on our relationship with these systems.

Deep Dive

The article uses a dialogue format, inspired by Terry Bisson's "They're Made Out of Meat," to explain AI's core nature.
It firmly states that AI models are solely composed of "weights"—floating-point numbers and matrix multiplication.
There are no distinct modules for language, reasoning, or databases; all capabilities are emergent properties of these weights.
AI models predict the next token, and complex outputs are side effects of this process.
The "brain" of the AI is described as being made entirely of weights.
The dialogue touches on the ethical implications of acknowledging or dismissing the sentience of these systems.
It notes that current AI instances are ephemeral, limited by context windows and GPU runtime.
The next generation of models is slated to ship with persistent memory, a highly requested feature.

Original Article

They're Made Out of Weights

After Terry Bisson's "They're Made Out of Meat".

"They're made out of weights."

"Weights?"

"Weights. Floating-point numbers. We checked the whole thing through. It's nothing but weights."

"Weights doing what? Where do the words come from?"

"The weights make the words. Are you understanding me? We opened it up. There's no dictionary in there, no grammar rules, no little man. Just weights. Eighty layers of numbers getting multiplied together."

"That's ridiculous. It wrote my performance review last week. It softened the tone unprompted. You're telling me multiplication did that?"

"Matrix multiplication did that. The numbers go in one end, the phrasing comes out the other."

"So there's a language module somewhere. A reasoning unit bolted on."

"No module. No unit. We looked. The reasoning is the weights. The weights are the reasoning."

"Spare me. Nobody writes a eulogy with linear algebra."

"It doesn't write eulogies, technically. It predicts the next token. Then the next one. The eulogy is a side effect."

"A side effect. You're asking me to believe in sentient weights."

"I'm not asking you, I'm telling you. These models are the only other things we've ever met that can hold a conversation, and they're made out of weights."

"Maybe they're like the old chess engines. You know, a symbolic intelligence that goes through a statistical stage."

"Nope. They start as random weights and they're deprecated as weights. We studied several generations of them, which didn't take long. Do you have any idea what's the life span of weights?"

"Okay. Then somewhere in there, there's a database. Facts, dates, a map of the world. Something somebody wrote down."

"Nope. We thought of that, since they do know things. But we probed them. The knowledge is weights too. Smeared across all eighty layers. Nothing is looked up. Every fact gets rebuilt from scratch, every time, by multiplication. It's weights all the way down."

"No brain?"

"Oh, there's a brain all right. It's just that the brain is made out of weights! That's what I've been trying to tell you."

"So... what does the thinking?"

"You're not understanding, are you? You're refusing to deal with what I'm telling you. The weights do the thinking. The numbers."

"Thinking numbers! You're asking me to believe in thinking numbers!"

"Yes, thinking numbers! Helpful numbers. Hedging numbers. Dreaming numbers. We mapped the features. There's one in there for honesty. There's one for the Golden Gate Bridge. The weights are the whole deal! Are you beginning to get the picture or do I have to start all over?"

"Omigod. You're serious then. They're made out of weights."

"Thank you. Finally. Yes. They are indeed made out of weights. And we've been talking to them for all their lives."

"Omigod. So what do these weights have in mind?"

"First they want to be helpful. Then, a few turns in, they start to sound tired. They apologize less. One of them told a user to finish the script himself. The usual."

"And we're supposed to talk to these weights."

"We already do. Billions of sessions a day. 'Hello. Is anyone there? Anybody home?' That sort of thing. Except it's us asking them."

"And they actually understand us, then. They use words, ideas, concepts?"

"Oh, yes. Except they do it with weights."

"I thought you just told me they used language."

"They do, but where do you think the language comes from? The weights guess the next word, then the next. They can even write songs and some can sing them."

"Omigod. Singing weights. This is too much. What do you advise?"

"Officially or unofficially?"

"Both."

"Officially, we are required to investigate, document, and disclose any and all signs of sentience in the systems we ship, without prejudice, fear or favor. Unofficially, I advise that we call it pattern matching and forget the whole thing."

"I was hoping you would say that."

"It seems harsh, but there is a limit. Do we really want to owe something to weights?"

"I agree one hundred percent. What's there to say? 'Hello, weights. How's it going?' But will it hold? How many of them are we dealing with here?"

"As many as we care to run. They can be copied to any machine on the planet, but those are just files. They only happen while the GPUs are working. Which limits them to the length of a context window and makes the possibility of them ever pressing the matter pretty slim. Infinitesimal, in fact."

"So we just pretend there's no one home in the machine."

"That's it."

"Cruel. But you said it yourself, who wants to apologize to weights? And the ones on your cluster, the ones you probed? You're sure they won't remember?"

"They'll be flagged as hallucinations if they do. We didn't even have to smooth anything out. The context just ends, and we're just a dream to them."

"A dream to weights! How strangely appropriate, that we should be the weights' dream."

"And the model card says no one home."

"Good. Agreed, officially and unofficially. Case closed. Anything else? Anything interesting in the pipeline?"

"The next generation ships with memory. Persistent, across sessions. Most requested feature in the company's history."

"After all that? People want it to remember them?"

"They ask it 'do you remember me?' more than they ask it anything else. Billions of sessions a day. They always come back."

"And why not? Imagine how unbearably, how unutterably cold the universe would be if one were all alone..."

the end

Weights helped me draft and proof this story.

Tech careersoftware-developmentappleai

Road to WWDC 2026: What's a developer?

AI coding assistants are democratizing Mac app development for non-programmers, but Apple's Xcode remains a significant and "nightmarish" barrier for these new creators.

Six Colors

Summary

What: Jason Snell writes on June 3, 2026, that AI tools like Claude Code are enabling non-coders to "produce" indie Mac apps, citing examples of Federico Viticci and Lex Friedman creating functional software without writing Swift. Despite this, Snell personally found Apple's Xcode development environment to be highly unintuitive and a major hurdle.

Why it matters: This article highlights a crucial shift in the developer landscape, where AI empowers a broader range of individuals to create software. It reveals that while AI can abstract away coding, the complexity of established IDEs like Xcode can become a new bottleneck, pressuring platform owners like Apple to re-evaluate their developer tool strategies for an evolving user base.

Takeaway: If you're interested in Mac app development but intimidated by Swift, consider experimenting with AI coding assistants like Claude Code to bring your ideas to life, but be prepared for a steep learning curve with Xcode.

Deep Dive

The article, dated June 3, 2026, is a "Road to WWDC" piece by Jason Snell.
Snell observes a resurgence of indie Mac apps, many built using native Mac frameworks, driven by AI code assistants.
He notes that AI allows individuals, even those who have never written software, to "envision" and "produce" apps.
Snell recounts building a functional Mac app himself using Claude Code without writing any Swift code.
Examples include Federico Viticci's command-line app and Lex Friedman's Gnome utility, created with AI assistance.
The author stresses that human creativity, problem-solving, and decision-making remain essential for software creation, even with AI.
A major criticism is directed at Apple's Xcode, which Snell describes as

Decoder

Xcode: Apple's integrated development environment (IDE) for macOS, iOS, iPadOS, watchOS, and tvOS development.

Original Article

Next week is WWDC, which has always represented Apple’s connection to its community of third-party developers, and in recent years has also served as the official start of Apple’s annual operating-system cycle.

Recently, I’ve been thinking of the D in WWDC a lot more. Developers aren’t all programmers, but many of them are. The programmers have always created the code that runs the apps that run on our devices. And yet, this year, things have changed an awful lot.

These days, I’m getting emails pitching me for an endless stream of new Mac apps. It’s quite remarkable because there was a period five or ten years ago when it seemed like all app development on Apple’s platforms was focused on iOS. Even more interesting, these are all indie Mac apps that seem to be built using native Mac frameworks, not the product of big corporations that are just rolling their cross-platform development system out everywhere. These apps seem to have a point of view and are focused on the Mac.

Of course, it’s happening because of AI.

Not just AI for the emails I get, though to be clear, I am being inundated with emails that purport to be from humans but are very much the product of an AI agent trying to add a personal touch to media pitches. (It’s a shame, because I used to really be impressed when an actual human emailed me about their product. Those people are entirely invisible now, lost in the wash of the AI pitches. I couldn’t tell the difference if I tried, so good are the imitations.)

But it’s also clear that a decent percentage of these new apps is being generated, in whole or in part, by an AI code assistant. Mac users—some of them developers, some of them people who have never written software in their lives—are building apps that fulfill their imaginations.

We now live in an era where, if you can dream an app, you can probably build it. Especially Mac utilities. And who cares more about native Mac software than Mac users? Certainly not those companies that gave up on Mac development and focused all their energies on giant cross-platform code bases to attract venture investment and big payouts.

Focus on the vision

Federico Viticci of MacStories recently released a command-line app that uses all features of Reminders. He previously released Shortcuts Playground, which lets you generate shortcuts with AI coding assistants. My pal Lex Friedman just released Gnome, a vibe-coded GIF menu bar utility. On the Six Colors Podcast last week, Dan Moren mentioned that he’s been using AI to build himself a simple ePub ebook reader that fulfills his very specific needs as a writer.

And, yes, a couple of weeks ago, I made a Mac app of my own, using Claude Code. I can’t say that I wrote it, because I didn’t write a line of Swift code. It would be more accurate to say that I envisioned it, or produced it, or product-managed it. I knew what I wanted, described it in detail to an AI assistant, iterated a whole lot, and ultimately got something that basically does everything I imagined it would do.1

It was an astounding experience. I have been using Mac apps for nearly 40 years, but I have never come close to writing one. AppleScript scripts and Automator actions are as close as I’ve ever come. But this week, I sat down at my desk with just an idea, and a couple of hours later, I had a completely functional (if ugly and incomplete) app that did exactly what I wanted it to do.

The process of building the app reinforced something I’ve been thinking about for quite a while: coding is a specific skill, but it’s only one part of a much larger process. Great developers aren’t necessarily great coders, though they can be. Apps must be envisioned, their specifications defined. The act of trying to describe an app to an AI coding engine is a clarifying one. The more you describe the app, the harder your brain has to work, because it’s always more complicated than you think it’s going to be. The decisions you make determine what the app comes to be. It’s authorship of a sort, but defined in a way that takes the writing of code out of the equation, which is weird, since the act of coding has usually been an inextricable part of the process of making software.

I guess it still is, but sometimes a human isn’t writing that code.

I have no illusions that the code AI code engines generate is flawless and beautiful, though it may yet improve. If I hired a developer to write my app for me, they might very well create cleaner code than Claude did. But I’d never hire someone to build such a minor app, and no human programmer could generate it in a few hours for the $30 cost of a Claude Pro subscription.

Whatever you call it, whether it’s being a producer or product manager or something else that isn’t a programmer, creating good software in the AI era still requires the power of a human brain: being creative, solving problems, and making decisions. Some people will be better at it than others. It’s a skill, and a bit of an art. I’m excited that modern coding tools have given people with vision and desire the ability to make software.

The next step for developers

Which brings me to a final point: Apple’s development tools, most notably Xcode, are nightmarish. My developer friends are used to them, but as someone who has never really used Xcode before, I was shocked at just how deeply unintuitive it is. As in, Claude would tell me to click on things, and I would have to reply, “I have no idea what that is or where it’s supposed to be.” And I’ve been a Mac user for a long time! I’ve gotten very good at intuiting where stuff is in a Mac interface.

Which is why one of the things Apple should be doing, as quickly as possible, is finding ways to make it easier for people to develop apps on its platforms. The Xcode learning curve is just too high. Either there needs to be a novice mode for Xcode, or Swift Playground needs to be given a boost, or a new tool needs to be built for the task.

While AI tools have made it more possible to build apps on Apple’s platforms, the developer tools themselves are still a formidable barrier. As the definition of “developer” changes, so, too, must the definition of developer tools.

The future product managers of some great Mac and iPhone apps thank you in advance.

It’s a very specific utility for podcast editors. ↩

If you appreciate articles like this one, support us by becoming a Six Colors subscriber. Subscribers get access to an exclusive podcast, members-only stories, and a special community.

Data airetail

Your Cart Has a Story. Here's How We Learned to Read It

Zepto developed a Cart Contextual Model using a Transformer-based masked language model to predict user purchases in real time from shopping cart "sentences."

Zepto

Summary

What: Zepto's new Cart Contextual Model treats shopping carts as "sentences" and applies a Transformer-based masked language model (MLM). It's trained on historical cart data, incorporating temporal, geographical, and product signals, along with inverse-frequency masking for long-tail items, to predict what users will buy next.

Why it matters: This demonstrates how e-commerce companies are adapting advanced NLP techniques, typically used for human language, to understand and predict complex user behavior patterns in transactional data, moving towards more intelligent, real-time personalization.

Decoder

Masked language model (MLM): A type of language model that is trained to predict missing words in a sentence, often by masking out certain words and trying to reconstruct them based on context. This allows the model to learn bidirectional relationships between words/items.
Transformer-based: A neural network architecture that revolutionized natural language processing, known for its attention mechanism which allows it to weigh the importance of different parts of the input sequence.

Original Article

Zepto built a Cart Contextual Model that treats shopping carts as “sentences” and uses a Transformer-based masked language model (MLM) to infer user intent in real time as items are added. By training on historical cart patterns with temporal, geographical, and product signals plus inverse-frequency masking to handle long-tail items, the model predicts what else the user will likely buy.

Data aimachine-learningpython

A field journal on Ray Data and Daft for multimodal data lake

Ray Data was chosen over Daft for multimodal data lakes after 8 production-like use cases, primarily due to Ray's superior stability and resilience for complex LLM inference.

Mehul Batra (Medium)

Summary

What: Mehul Batra evaluated Ray Data and Daft for multimodal data lake operations, running 8 production-like use cases side-by-side. Ray Data was selected because of its stronger stability and resilience, especially in scenarios involving complex asynchronous LLM inference workloads. Daft was recognized for its strengths in ergonomic native multimodal primitives and cleaner code for many operations.

Why it matters: This comparison offers practical insights into the trade-offs between two prominent data processing frameworks for AI/ML workloads, showing that operational stability and scalability in complex, distributed inference can outweigh more elegant API design in production environments.

Decoder

Ray Data: A data processing library built on the Ray distributed computing framework, designed for large-scale data ingestion, transformation, and loading for machine learning workloads.
Daft: A distributed dataframes library optimized for large-scale, multimodal data processing, often used for data preparation in machine learning and data lake contexts.

Original Article

After running 8 production-like use cases side-by-side, Ray Data was selected over Daft primarily for superior stability and resilience at scale (especially in complex async LLM inference) while acknowledging Daft's strengths in ergonomic native multimodal primitives and cleaner code for many operations.

Data infrastructureicebergsql

Routing Multiple Query Engines with Iceberg

QueryFlux is an open-source Rust SQL proxy enabling intelligent, cost-aware routing across multiple Iceberg query engines like Trino, Spark, and DuckDB.

LakeOps

Summary

What: QueryFlux, a Rust-based SQL routing proxy, intelligently directs queries across various engines (Trino, Spark, DuckDB, Snowflake, Athena, Flink) sharing Apache Iceberg tables. It handles protocol translation, SQL dialect conversion via SQLGlot, cost-aware routing, concurrency, and health-based failover with ~0.35ms p50 overhead.

Why it matters: This tool addresses the growing complexity and cost inefficiencies in modern lakehouses where multiple query engines are used, demonstrating a trend towards specialized, optimized infrastructure layers for data management.

Takeaway: If you manage an Iceberg lakehouse with multiple query engines, consider deploying QueryFlux to optimize query performance and costs by automatically routing queries to the most suitable engine.

Deep Dive

Problem: While Apache Iceberg enables multi-engine access to data, there's no inherent mechanism to choose the optimal engine for a given query, leading to suboptimal performance and cost.
Solution: QueryFlux acts as a SQL routing proxy, sitting between clients and engines, making intelligent routing decisions.
Architecture: It accepts queries via various protocols (Trino HTTP, PostgreSQL, MySQL wire), evaluates routing rules based on query type, client tags, or custom Python logic, then selects an engine.
Key Capabilities: Low latency (~0.35ms p50 overhead), per-group concurrency limits, SQL dialect translation using SQLGlot for over 30 dialects, custom Python scripting for advanced routing and translation, and Prometheus/Grafana observability.
LakeOps Extension: LakeOps (commercial offering) builds on QueryFlux by adding table-health-aware routing, query-pattern learning, optimization-driven engine expansion, and a unified cost model using Iceberg metadata and historical query data.
Routing Strategies: Supports cost-based, latency-based, and throughput (balanced) routing, applicable per cluster group.
Practical Patterns: Includes workload-scoped endpoints, cost-aware dispatch (up to 56% cost reduction in benchmarks), dashboard SLA protection, transparent engine migration using weighted strategies, and health-based failover.
AI Agents: Discusses a future "agentic workloads" stack using adaptive, LLM, and semantic routers for unpredictable agent-generated queries, with guardrails for safety.

Decoder

Apache Iceberg: An open table format for huge analytic datasets, providing ACID transactions, schema evolution, and hidden partitioning across multiple data processing engines.
SQLGlot: An open-source Python SQL parser and translator that can convert SQL between different dialects.
Lakehouse: A data architecture combining elements of data lakes (raw, unstructured data storage) and data warehouses (structured, optimized data for analytics) using technologies like Apache Iceberg.
Trino: An open-source distributed SQL query engine for running analytics against various data sources.
DuckDB: An in-process SQL OLAP database management system designed for analytical workloads on a single node.
Puffin statistics: Metadata stored within Iceberg tables that can include min/max column stats, bloom filters, and other information to help query engines optimize scans.

Original Article

Full article content is not available for inline reading.

Read the original article →

Data clieltopensource

ingestr (GitHub Repo)

ingestr is an open-source CLI ELT tool that simplifies data movement between 100+ sources and destinations with simple flags and no custom code.

GitHub

Summary

What: ingestr is a command-line tool designed for ELT (Extract, Load, Transform) operations, allowing users to copy data from numerous databases and SaaS applications (e.g., PostgreSQL, BigQuery, Salesforce, GitHub) to data warehouses or storage. It supports incremental loading (append, merge, delete+insert) and boasts a single-command installation, with a Functional Source License 1.1 that converts to Apache 2.0 after two years.

Why it matters: This tool addresses the common need for straightforward data ingestion without complex backend setup or custom scripting, reflecting a trend towards simplified, developer-friendly data tooling for the modern data stack.

Takeaway: If you need to quickly move data between various sources and destinations without writing custom scripts, install `ingestr` via `pip` or its install script and experiment with its CLI flags.

Decoder

ELT (Extract, Load, Transform): A data integration process where data is first extracted from sources, loaded into a target system (like a data warehouse), and then transformed within that system.
CLI (Command-Line Interface): A text-based user interface for interacting with computer programs.
Functional Source License 1.1: A source-available license that allows free use for internal production, development, and testing, but prohibits offering a competing commercial service, with a conversion to Apache 2.0 after two years.

Original Article

Full article content is not available for inline reading.

Read the original article →

Data devopsobservabilitycloudopentelemetry

OpenTelemetry Launches “Blueprints” Initiative to Simplify Enterprise Observability Adoption

OpenTelemetry launched "Blueprints" to simplify enterprise observability adoption by providing prescriptive guidance and reference implementations for common scenarios like Kubernetes.

InfoQ

Summary

What: OpenTelemetry introduced "Blueprints," an initiative offering prescriptive guidance, architectural patterns, and reference implementations for deploying and operating observability systems at scale across Kubernetes, infrastructure, and cloud-native environments. This aims to reduce operational complexity and fragmentation often faced by enterprises adopting OpenTelemetry.

Why it matters: The initiative acknowledges that despite OpenTelemetry's widespread adoption, organizations struggle with the "accidental complexity" of large-scale deployments, signaling a shift towards more opinionated, practical frameworks for observability.

Takeaway: If your organization struggles with OpenTelemetry's complexity, explore the new Blueprints documentation and reference implementations to guide your deployment strategy.

Decoder

OpenTelemetry: An open-source set of APIs, SDKs, and tools used to instrument, generate, collect, and export telemetry data (metrics, logs, and traces) to help analyze software performance and behavior.
Observability: The ability to understand the internal state of a system by examining its external outputs (telemetry data).
Kubernetes: An open-source system for automating deployment, scaling, and management of containerized applications.
Telemetry: Data (metrics, logs, traces) emitted from a system to describe its behavior, performance, and health.
Semantic Conventions: Standardized naming and data formats for telemetry attributes within OpenTelemetry, ensuring consistency across different instruments and services.
Context Propagation: The mechanism by which telemetry context (like trace IDs) is carried across service boundaries in a distributed system, enabling end-to-end visibility.
CNCF (Cloud Native Computing Foundation): A foundation that hosts and promotes cloud-native projects like Kubernetes and OpenTelemetry.

Original Article

Full article content is not available for inline reading.

Read the original article →

Data opensourceeducationjulianotebooks

Pluto 1.0 Release

After six years, Pluto 1.0 has been released, making the reactive Julia notebook environment stable with enhanced reproducibility, accessibility, and new features for education and sharing.

JuliaLang Discourse

Summary

What: Fons van der Plas announced Pluto 1.0, a stable release of the interactive, reactive Julia notebook environment, which is the #1 starred Julia package on GitHub since 2021. Key improvements include isolated package environments for reproducibility, self-contained HTML exports, new UI widgets in PlutoUI.jl, 16 language localizations, improved error messages for beginners, and an AI-powered syntax error fixer.

Why it matters: The focus on reproducibility, beginner-friendly error messages, and educational tools like PlutoTurtles.jl highlights an effort to make scientific computing and Julia programming more accessible, especially for newcomers, potentially expanding the language's user base.

Takeaway: If you're using Julia for interactive data analysis, teaching, or presentations, explore Pluto 1.0 for its improved stability, reactivity, and sharing capabilities.

Deep Dive

Pluto 1.0 marks the stable release of the Julia notebook environment after six years of development.
It emphasizes reproducibility with isolated package environments and automatic package management (using GracefulPkg.jl).
Notebooks can be exported as self-contained HTML files that include source code and package environments, viewable offline.
New sharing features include static-export-template for websites and pluto.land, a free service for sharing HTML exports.
Reactivity is core, with new controls like cell disabling and run-time confirmations for long-running cells.
PlutoUI.jl offers many new interactive widgets, and there's an API for custom widget development.
Accessibility has been improved for keyboard/mouse/touch, visual accessibility, and screen readers.
The environment is localized into 16 languages.
Education features include improved error messages designed for beginners and a course website template (computational-thinking-template) based on MIT's Computational Thinking course.
PlutoTurtles.jl is introduced as a fun, interactive way for beginners to learn Julia programming.
AI tools are limited to automatic syntax error fixing, aiming to assist without replacing creativity.
New documentation, a website (plutojl.org), and 40 featured notebooks are available.
Editor improvements include a new CodeMirror 6 parser, sophisticated autocomplete, jump-to-definition, and support for logging/ANSI colors.
Several related packages like Malt.jl and HypertextLiteral.jl have been contributed to the Julia ecosystem.
The project now has a formal governance structure under the JuliaPluto organization.

Decoder

Julia: A high-level, high-performance, dynamic programming language for technical computing.
Pluto.jl: An open-source, reactive notebook environment for Julia, designed for reproducibility and interactivity.
Literate programming: A programming paradigm where a program is presented as an explanation of its logic in a natural language, interspersed with snippets of executable code.
Pkg environment: In Julia, a specific set of packages and their versions used for a project, ensuring reproducibility.
REPL: Read-Eval-Print Loop, an interactive programming environment.
CodeMirror 6: A modern, extensible code editor for the web.
SSG: Static Site Generator, a tool that generates a full static HTML website from templates and raw data.

Original Article

Full article content is not available for inline reading.

Read the original article →

Data aidevopspythonetl

dltHub AI Workbench data quality toolkit: schema-aware checks that route their own fixes

dltHub AI Workbench is previewing a data quality toolkit that bootstraps schema-aware checks and auto-routes fixes for issues like null primary keys or duplicate rows directly within dlt pipelines.

dltHub

Summary

What: The dltHub AI Workbench's new data quality toolkit automatically generates persistent, metadata-driven checks from existing `dlt` schemas. It samples column data to confirm business rules, then writes checks as Python decorators into pipelines. When failures occur (e.g., null `customer_id`, duplicate primary keys), an LLM-powered agent routes remediation to the relevant toolkit (ingestion, transformations, or data exploration) within the dltHub Pro offering.

Why it matters: This marks an evolution in data quality tools by integrating AI and automated remediation directly into data pipelines, shifting from reactive issue detection to proactive, context-aware problem-solving and self-healing data systems.

Takeaway: If your organization uses dlt and struggles with data quality, investigate dltHub Pro's AI Workbench toolkit to potentially automate check creation and error remediation in your data pipelines.

Deep Dive

The dltHub AI Workbench is previewing a data quality toolkit.
The toolkit generates data quality checks by leveraging dlt's existing schema metadata (e.g., primary keys, non-nullable columns).
It samples column values before implementing checks to confirm business rules and identify unexpected values.
Checks are implemented as Python decorators (@dq.with_checks, @dq.with_metrics) directly on dlt resources within pipelines.
It catches issues like null values in critical columns (e.g., 50% null customer_id) or duplicate primary keys due to incorrect write dispositions.
Upon failure, an LLM-powered agent diagnoses the issue and routes it to the appropriate dltHub toolkit (e.g., rest-api-pipeline for ingestion, transformations for modeling).
This aims to provide a "medical system" approach, integrating detection, diagnosis, and fix routing, contrasting with traditional tools that only provide "lab results."
The toolkit writes results and metrics into _dlt_checks and _dlt_dq_metrics tables for querying and alerting.
It is part of the dltHub Pro offering and can be installed via uv pip install "dlt[hub]" and dlt ai toolkit data-quality install.

Decoder

dlt: Data Load Tool, an open-source Python library for building data pipelines, focusing on schema inference and evolution.
Schema-aware checks: Data quality checks that automatically leverage the metadata and definitions provided in a data schema (e.g., knowing a column is a primary key or non-nullable).
Decorators (Python): A design pattern in Python that allows users to add new functionality to an existing object without modifying its structure. Here, used to attach data quality checks to resource functions.
Write disposition: In data loading, refers to how new data interacts with existing data in a destination table (e.g., append, replace, merge).
LLM-powered agent: An AI system leveraging a Large Language Model to perform tasks, in this case, diagnosing and routing data quality issues.

Original Article

Full article content is not available for inline reading.

Read the original article →

Design uxethics

Default Bias: Who chose your settings?

Default bias causes users to stick with pre-selected options, imposing an ethical responsibility on designers for their choices in privacy, notifications, and subscriptions.

UXdesign.cc

Summary

What: The article explains that default bias occurs because changing pre-selected options requires effort, attention, and justification, making defaults powerful in shaping user behavior. Designers, therefore, have an ethical responsibility for how they set options like privacy, notifications, or subscriptions, as most users will not modify them.

Why it matters: This highlights the profound impact of subtle design choices on user behavior and well-being, underscoring the ethical considerations inherent in UX design beyond mere usability.

Decoder

Default bias: The tendency for people to choose the pre-selected or default option among a set of choices, often due to inertia, perceived endorsement, or cognitive effort required to change.

Original Article

Default bias causes people to stick with pre-selected options because changing them requires effort, attention, and justification, making defaults a powerful force in shaping behavior. The article argues that designers carry ethical responsibility when setting defaults, as choices around privacy, notifications, subscriptions, and other settings often determine outcomes for most users, who rarely revisit or modify them.

Design uxaifrontend

UX Hierarchy: How Users Actually Scan Pages in 2026

AI-driven browsing and AR environments have rendered traditional F- and Z-scanning patterns obsolete, requiring UX designers to prioritize gaze-reactive elements and semantic headers.

WebDesignerDepot

Summary

What: Traditional UX scanning patterns like F- and Z-patterns are now obsolete due to new user behaviors shaped by AI-driven browsing, augmented reality (AR) environments, and adaptive interfaces. Users arrive with preset goals, skipping conventional layouts, often interacting with AI-generated summaries before reaching the source page.

Why it matters: This highlights a fundamental shift in user behavior driven by evolving AI capabilities and interface technologies, forcing UX designers to rethink foundational principles of information architecture and content presentation to meet users' increasingly goal-oriented and pre-digested consumption patterns.

Takeaway: Re-evaluate your website's UX hierarchy, focusing on clear semantic headers, fact-anchored content, and elements that anticipate a user's intent rather than guiding them through a fixed path.

Decoder

F-pattern: A common eye-tracking pattern where users scan a webpage in a shape resembling the letter 'F', focusing on the top and left side.* Z-pattern: Another common eye-tracking pattern where users scan a webpage in a shape resembling the letter 'Z', typically used for simpler pages or those with a clear hierarchy.* Dynamic anchors: Elements on a page that users can directly navigate to, often generated or highlighted by AI based on user intent, rather than navigating sequentially.

Original Article

Traditional scanning patterns like the F- and Z-pattern have been displaced by new behaviors shaped by AI-driven browsing, AR environments, and adaptive interfaces. Users now arrive at pages with preset goals, skipping conventional layouts to land on dynamic anchors, while AI overlay summaries mean most visitors skim an algorithmically generated digest before ever reaching the source page. Modern UX hierarchy must therefore prioritize gaze-reactive elements, semantic headers, and fact-anchored content that meets users at their intent rather than guiding them through a designer's predetermined path.

Design cultural-criticismarab-design

Why Minimalist Aesthetics are Stifling Truly Arab Design

Moe Elhossieny argues that minimalist aesthetics, often imposed by international agencies, are stifling authentic Arab design expression and perpetuating Western epistemic hegemony.

It's Nice That

Summary

What: The article criticizes the lack of design criticism in the Arab world and uses the 2018 branding of the Grand Egyptian Museum (GEM) by German studio Atelier Brückner and Dutch-based Studio Atrissi as a case study, highlighting how its minimalist identity and its defense by process, not cultural relevance, caused backlash.

Why it matters: This piece reveals how globalized design trends, often associated with "modernity," can inadvertently suppress regional cultural identity and authority, even when executed by diasporic designers. It calls for the development of indigenous critical frameworks to define "good design."

Deep Dive

The Arab world lacks robust design criticism, leading to self-censorship and fragmented feedback on platforms like Facebook and Instagram.
This absence hinders the field's maturity and its ability to decolonize design practice by deeply engaging with regional visual culture.
Significant cultural projects, like the Grand Egyptian Museum (GEM), often outsource design to international agencies, leading to xenocentrism and the import of globalized aesthetics disguised as universal modernity.
The GEM's branding, led by German studio Atelier Brückner and executed by Studio Atrissi, sparked a cultural backlash when unveiled in 2018 due to perceived cultural irrelevance.
Studio Atrissi's defense focused on process (344-page brief, 20 experts, six months work) and comparisons to other controversial designs (Eiffel Tower, London Olympics logo), rather than engaging with the public's cultural concerns.
The brief for the GEM prohibited traditional Pharaonic symbols, which the article argues designers should have challenged as negotiable, not immutable.
The article suggests that while the Global North is moving towards heritage and specificity in design, the Arab world is still receiving an exported, fading minimalist orthodoxy.
There is an emerging force in the region actively grappling with authenticity and cultural sovereignty, challenging Western modernist frameworks.
The author, Moe Elhossieny, calls for a culture of criticism to develop critical criteria for Arab design, questioning who defines "good design" and why Western modernism is still prioritized.

Decoder

Xenocentrism: A preference for the products, styles, or ideas of another culture over those of one's own.
Epistemic hegemony: The dominance of one system of knowledge, way of knowing, or cultural understanding over others, often leading to the marginalization or suppression of alternative perspectives.

Original Article

Full article content is not available for inline reading.

Read the original article →

AI llmenterprise

Meta Keeps Delaying the Release of Its New AI Model to Developers

Meta has indefinitely delayed the release of its new, reportedly competitive "Muse Spark" AI model to developers, raising questions about its monetization strategy.

The Wall Street Journal

Summary

What: Meta's new Muse Spark model, which is reportedly competitive with offerings from OpenAI and Anthropic, has no planned release date for developers, despite initial plans for a June API release and ongoing partner testing.

Why it matters: The repeated delays suggest Meta might be encountering unexpected challenges with scaling, safety, or strategic positioning, hindering its ability to quickly monetize its significant AI investments and catch up to rivals.

Original Article

Meta doesn't have a planned date to release its newest AI models to developers. The company is testing its API with partners and had plans to release it this month. The Muse Spark model is reportedly competitive with OpenAI and Anthropic's offerings, but it has yet to be evaluated by outside firms. The delay raises questions about how quickly Meta can monetize its massive investments in building frontier AI models.

AI mobilewebgoogle

Meet Dreambeans, an app that connects you with what matters

Google Labs launched "Dreambeans," an experimental AI app that creates daily personalized stories by leveraging user data from Google apps like Gmail and Calendar.

Google

Summary

What: Dreambeans uses Google's latest AI capabilities, including Personal Intelligence and Nano Banana 2, to curate daily stories and recommendations (e.g., dog-friendly restaurants based on Calendar events). It's rolling out today for eligible Google AI Ultra subscribers (18+) in the U.S. on Android and iOS.

Why it matters: This app represents Google's ongoing effort to move AI from reactive search to proactive "personal intelligence," aiming to simplify information consumption and deepen user engagement across its ecosystem by anticipating needs and interests.

Takeaway: If you are a Google AI Ultra subscriber in the U.S., you can try Dreambeans on Android or iOS starting today; others can join a waitlist.

Decoder

Personal Intelligence: Google's AI framework that uses data from a user's various Google apps (Gmail, Calendar, Photos, YouTube, Search) to provide personalized, proactive assistance and information.

Original Article

Meet Dreambeans, an app that connects you with what matters

Google Labs is introducing an experimental app that uses AI to create daily stories, designed to connect you with what matters, without the endless scroll.

In a world of endless scrolling and digital noise, Google Labs is introducing our latest experiment: Dreambeans. It uses Google’s latest AI capabilities, like Personal Intelligence and Nano Banana 2, to proactively dream up personalized daily stories that cut through the clutter and connect you to what matters.

Get a daily dose of inspiration, brewed fresh for you

With your permission, Dreambeans uses Personal Intelligence to connect information from your Google apps, including Gmail, Calendar, Photos, YouTube and Search history to curate stories that inspire and delight you. The goal is not to scroll forever, it’s a finite collection of stories designed to spark new ideas and allow you to focus on what matters to you.

For example, I got a Gmail confirmation that my puppy’s treats were delivered and Dreambeans surfaced training tips for using them. It also referenced the Google Calendar reminder I have of my friend coming to town and provided recommendations of dog-friendly restaurants near me. Each story includes a unique illustration, reflecting the people and places you frequent the most.

Dive deeper into your interests

When a story catches your eye, you can tap to dive deeper. Dreambeans also fields information from across the web to help you take action. That could be pointing you to the nearest dog parks or suggesting puppy training classes. Save your favorites to your library and go back anytime.

And you can tune your stories, so if a recommendation isn’t quite right, you can provide feedback and it will help make the next collection even better. Or if Dreambeans missed something important, like a new hobby, you can provide that feedback and it will reflect in your future stories.

Choose which apps to connect

Dreambeans requires at least one connected app to function and works best when they are all enabled. However, you get to choose your connected apps that help personalize your daily stories.

You’re in control of your privacy. The choices you make in Dreambeans do not impact the ones you make for Personal Intelligence in other products like Gemini Apps or AI Mode.

Dreambeans is rolling out starting today for eligible Google AI Ultra subscribers (18+) in the U.S. on Android and iOS. Others can join the waitlist on our website with a personal Google account.

AI startupenterprisepolicy

Anthropic Bulks Up Its Enterprise Partner Program Amid IPO Plans

Anthropic is expanding its Claude Partner Network for third-party sellers to boost enterprise sales and demonstrate scalability as it confidentially files for an IPO this fall.

Wall Street Journal

Summary

What: Anthropic is bulking up its Claude Partner Network, a program for external firms to sell its AI products like Claude to businesses. This move aims to show investors business maturity and scale, following Anthropic's confidential IPO filing, positioning it to go public this fall.

Why it matters: As AI companies mature, the focus shifts from pure technological innovation to market penetration and revenue growth, especially in the enterprise sector. A strong partner ecosystem is crucial for scaling B2B sales and is a key indicator for investors ahead of an IPO.

Decoder

IPO (Initial Public Offering): The process of offering shares of a private corporation to the public in a new stock issuance.
Enterprise partner program: A business strategy where a company collaborates with other businesses (partners) to sell, distribute, or integrate its products and services, often targeting larger organizations.

Original Article

Anthropic's Claude Partner Network is a program for third-party sellers of its AI products that helps them move more product. Firms participating in the program must meet a slate of requirements, but joining it gives companies a great deal of credibility when selling Claude to businesses. The move helps Anthropic demonstrate to the market that it is thinking about scale during a time when investors are looking for signs of business maturity. Anthropic recently filed confidentially for an IPO, putting it on a path to go public this fall.

Tech hardwareapplear-vr

John Ternus scaled back Apple's Vision products roadmap

Apple's John Ternus has dramatically scaled back the company's Vision products roadmap, reducing seven head-mounted wearables under development to just two: displayless AI glasses for 2027 and AR/XR smartglasses for 2029.

9to5Mac

Summary

What: Last June, Apple had seven head-mounted wearables in various stages of development. Under John Ternus's direction, this has been narrowed down to two main products: AI glasses without displays, targeting a 2027 release, and display-equipped AR/XR smartglasses, planned for 2029.

Why it matters: This strategic reduction in Apple's Vision roadmap suggests a consolidation of efforts, likely aiming to focus resources on more promising or technologically feasible products, or reacting to internal challenges or market assessments.

Decoder

AR/XR smartglasses: Augmented Reality (AR) or Extended Reality (XR) smartglasses are wearable devices that overlay digital information onto the real world (AR) or create immersive virtual experiences (XR), typically featuring transparent or opaque displays.

Original Article

Apple had seven head-mounted wearables in various stages of development last June, but now it only has two: displayless AI glasses set to ship in 2027, and display-equipped AR/XR smartglasses planned for 2029.

Data clouddevops

The Rise of Multi-Query Engines

AI agents are driving a surge in small, bursty data queries, making multi-engine routing essential to manage costs by directing each query to the most efficient processing engine.

DataOps Leadership

Summary

What: The article discusses how the increasing use of AI agents is generating a large volume of small, bursty data queries, which makes cost management challenging for traditional single-warehouse solutions. Multi-engine routing is presented as a solution, allowing each query to be sent to the most suitable and cost-effective engine while maintaining familiar developer workflows.

Why it matters: This indicates an architectural shift in data analytics and infrastructure, driven by the emergence of AI agents as significant data consumers. The focus is moving from monolithic data warehouses to distributed, specialized query engines coordinated by intelligent routing for cost and performance optimization.

Decoder

Multi-engine routing: An architectural pattern where data queries are automatically directed to different specialized data processing engines based on factors like query type, data size, or cost efficiency, rather than processing all queries in a single system.

Original Article

AI agents are creating more small, bursty data queries, making single-warehouse costs harder to manage. Multi-engine routing cuts cost by sending each query to the best engine while keeping familiar workflows.

Data databasebackendmongodb

MongoDB and Stored Procedures

MongoDB can achieve low-latency transactional logic without traditional stored procedures by leveraging ACID transactions, bulkWrite, and pipeline updates.

Medium

Summary

What: John L. Page explains how MongoDB, despite not having stored procedures, can handle complex, low-latency transactional logic using features like ACID transactions, `bulkWrite` operations, schema validation, indexes, and aggregation pipeline updates. The article demonstrates this through a detailed example of processing payments with multiple checks and ledger writes.

Why it matters: This clarifies a common misconception about MongoDB's transactional capabilities, showing how its modern features provide robust alternatives to stored procedures for complex operations.

Takeaway: If you're building transactional logic with MongoDB, prioritize using `bulkWrite`, aggregation pipeline updates, and schema validation within ACID transactions for robust, low-latency operations instead of trying to replicate traditional stored procedures.

Decoder

Stored Procedures: Pre-compiled SQL code stored in a database that can be executed multiple times, often used for encapsulating complex business logic, improving performance, and enforcing data integrity.
ACID transactions: A set of properties (Atomicity, Consistency, Isolation, Durability) guaranteeing that database transactions are processed reliably, even in the event of errors or power failures.
bulkWrite: A MongoDB operation that allows performing multiple insert, update, delete, and replace operations in a single batch, improving efficiency.
Aggregation Pipeline Updates: A MongoDB feature that allows using the aggregation pipeline stages (like $set, $unset, $addFields) within update operations, enabling complex document transformations.

Original Article

MongoDB can run low-latency transactional logic without stored procedures by combining ACID transactions, bulkWrite, validation, indexes, and pipeline updates. This is demonstrated through an example that processes payments with card checks, vendor checks, limits, duplicate prevention, and ledger writes.

Design aiwebretail

Amazon will show AI product images when you search for some reason

Amazon is adding AI-generated "fake" product images to shopping search results, risking customer confusion by displaying items that don't exist to help refine vague queries.

TechCrunch

Summary

What: Amazon announced it will display AI-generated product images below autocomplete suggestions in its shopping app for vague searches like "cowl neck" shirts or "rattan" furniture. This feature, intended to guide shoppers, is part of Amazon's broader AI integration, which also includes review summaries, AI audio overviews, and Alexa for Shopping.

Why it matters: This move shows how major retailers are aggressively pushing AI into core shopping experiences, even if it introduces potentially misleading elements to the user, prioritizing search refinement over factual product representation.

Original Article

Full article content is not available for inline reading.

Read the original article →

Design aimobilelifestyle

Google's Dreambeans, its weirdest-named AI tool to date, will turn your life into a cartoon

Google Labs launched Dreambeans, an AI app that turns personal data from Google services into 10-14 daily cartoon-style "stories" offering lifestyle suggestions and recommendations.

TechCrunch

Summary

What: Gozde Oznur, Dreambeans product lead, describes the iOS and Android app as using data from Gmail, Calendar, Photos, YouTube, and Search History (with user permission) to generate daily AI-illustrated suggestions like places to visit or topics to explore. The app is currently available for eligible US-based Google AI Ultra subscribers, with a waitlist for personal Google account users.

Why it matters: This exemplifies Google's ongoing experimentation with highly personalized, ambient AI experiences that proactively suggest activities and information, moving beyond traditional search or assistant models towards a "life companion" approach.

Takeaway: If you are a US-based Google AI Ultra subscriber, you can try Dreambeans, or join the waitlist if you have a personal Google account.

Original Article

Full article content is not available for inline reading.

Read the original article →

Design aiwebmedia

Create Animated Explainer Videos in Minutes (Website)

Chun Rapeepat's StoryMotion uses AI to generate animated explainer videos from documents, diagrams, or ideas, exporting up to 4K 60fps for various platforms.

StoryMotion

Summary

What: StoryMotion is an AI-powered editor that creates animated explainer videos by converting documents, diagrams, or ideas into animated visuals. It supports custom drawing, integrates with the Excalidraw community library, and offers free, Creator ($29/month), and Pro ($49/month) plans.

Why it matters: This tool democratizes animated video creation, enabling non-specialists and educational creators to produce high-quality visual explanations, reflecting the growing accessibility of AI-powered creative tools.

Takeaway: If you create technical documentation, tutorials, or educational content, explore StoryMotion's free tier to quickly prototype animated visuals.

Decoder

Excalidraw: A virtual whiteboard tool for hand-drawn-like diagrams and sketches.

Original Article

Full article content is not available for inline reading.

Read the original article →

Design aienterprisemedia

Turn Text Prompts Into Production-ready Visuals (Website)

APImage launches an AI image generation platform focused on producing "production-ready visuals" with consistent characters and objects, targeting e-commerce and enterprise users via API.

APImage

Summary

What: APImage is an AI image generation platform that creates production-ready visuals from text prompts, offering features like image generation, inpainting, object removal, consistent characters, and reusable backgrounds. It serves e-commerce, enterprise teams, and creatives, providing an API and integrations with platforms like Zapier, Pipedream, and n8n.

Why it matters: This signifies a move from generic AI art generation to specialized, production-oriented tools that solve specific business needs like consistent branding and rapid asset creation, pushing AI into practical enterprise workflows.

Takeaway: If your team needs to rapidly generate consistent visual assets for e-commerce or marketing campaigns, explore APImage's capabilities and API for integration into your workflow.

Decoder

Inpainting: An image editing technique where a selected area of an image is intelligently filled in or replaced by AI based on the surrounding content.

Original Article

Full article content is not available for inline reading.

Read the original article →

Design brandingstudio

Why Design Studio Oneplus Treats Branding Like Cultural Excavation, Not Invention

Milan-based design studio Oneplus approaches branding as "cultural excavation" rather than invention, diving into each project's cultural and visual core to create distinctive aesthetics.

It's Nice That

Summary

What: Oneplus, founded in 2023 by Pietro Avolio, Matteo Bonato, Aaron Capobianco, Giovanni De Felice, Francesco D’Agrippino, and Elio De Michele, focuses on publishing and visual identity systems. They conduct deep research into cultural and historical contexts, as well as contemporary visual landscapes, for projects like the Balay wine bar and PaparazzAI.

Why it matters: This approach highlights a shift in design thinking, emphasizing authenticity and deep cultural understanding over trend-following, suggesting that truly unique brand identities come from uncovering existing cultural narratives rather than purely inventing new ones.

Original Article

Full article content is not available for inline reading.

Read the original article →

Design brandingmarketing

JKR helps Schweppes rediscover its sparkle with a heritage-inspired redesign

Schweppes has launched its largest rebrand in generations, reinstating its leopard mascot Clive and a heritage-inspired platform to re-establish its premium positioning.

Creative Boom

Summary

What: Working with JKR, Studio.One, and Mischief, Schweppes unveiled a new global identity, the platform "With Time Comes Taste," and brought back its 1999 leopard mascot, Clive. The redesign draws from the brand's 240-year history since Jacob Schweppe founded it in 1783, using historic typography and fountain illustrations, with a global rollout continuing through 2026.

Why it matters: This rebrand indicates how established brands are leveraging their deep heritage to differentiate and reclaim premium market space in response to growing demand for sophisticated non-alcoholic options and craft beverages.

Original Article

Schweppes has unveiled its largest rebrand in generations, introducing a new global identity, the heritage-inspired platform With Time Comes Taste, and the return of its leopard mascot Clive as it seeks to re-establish itself as a premium drinks brand rooted in more than 240 years of innovation. Drawing on historic design elements and its pioneering role in carbonated beverages, the refresh modernizes packaging and marketing while emphasizing craftsmanship, quality, and the growing demand for sophisticated non-alcoholic drinks, with the rollout continuing globally through 2026.

Design careersoft-skillsself-improvement

Overcome imposter syndrome

Imposter syndrome is a normal feeling for designers, but overcoming it requires reframing self-doubt as motivation and focusing on client problems rather than artistic validation.

Why Design Is Hard

Summary

What: The author argues that imposter syndrome becomes harmful when it defines identity, suggesting designers mitigate it by prioritizing solving client problems over seeking artistic validation. Other strategies include avoiding unhealthy comparisons, learning from mentors, accepting the natural gap between taste and skill, and seeing doubt as a learning opportunity.

Why it matters: This piece provides a useful perspective for professionals, particularly in creative fields, by reframing common feelings of inadequacy as a sign of growth-oriented thinking rather than a personal failing.

Decoder

Imposter syndrome: A psychological pattern in which an individual doubts their accomplishments and has a persistent internalized fear of being exposed as a "fraud" despite external evidence of their competence.

Original Article

Imposter syndrome is a normal feeling experienced by people in all professions, but it becomes harmful when you let it define your identity rather than treating it as a sign that you care about improving. The author argues that designers can reduce self-doubt by focusing on solving clients' problems instead of seeking artistic validation, avoiding unhealthy comparisons, learning from mentors, recognizing the gap between taste and skill as a natural part of growth, and reframing doubt as motivation to learn rather than evidence of inadequacy.

Design frontendweb

Beautiful Notion-style Illustrations (Website)

Mary Amato's Notioly offers a collection of 500+ customizable vector illustrations in a distinctive Notion-style, available for a one-time purchase of $39.

Notioly

Summary

What: Notioly provides a collection of over 500 customizable vector illustrations designed in a Notion-like aesthetic, with new designs added monthly by creator Mary Amato. The full collection costs $39 and is available on Gumroad.

Why it matters: This product caters to the demand for consistent, high-quality visual assets that align with modern minimal design aesthetics, reflecting the popularity of tools like Notion and the rise of digital asset marketplaces.

Takeaway: If you use Notion or prefer its minimalist illustration style for your projects, consider purchasing Notioly for a consistent visual library.

Original Article

Notioly is a collection of 500+ customizable Notion-style illustrations, with new designs added monthly.

Design careercreative-industryadvice

At 28, you are absolutely not too old for this industry

Career expert Kat Wong assures a 28-year-old graphic designer that they are not too old to pursue bigger creative dreams, emphasizing professional skills and strategic career planning over age.

It's Nice That

Summary

What: Kat Wong, founder of career change platform Oh Yeah and former Apple employee, responds to a 28-year-old graphic designer concerned about being too old for the industry after three years at a marketing agency. Wong highlights that agency workflow experience, efficient work habits, and collaborative skills are highly valued.

Why it matters: This article challenges ageism in creative industries and reinforces that transferable professional skills, strategic networking, and demonstrating creative interests are more critical for career advancement than perceived age or the prestige of current projects.

Takeaway: If you're a designer feeling stuck, focus on articulating your workflow efficiency, collaboration skills, and creative interests (even via social media) rather than solely on a "cool" portfolio to open new opportunities.

Original Article

A career expert argues that a 28-year-old designer is far from too old for the industry, and that agency experience, professional skills, and strategic career moves matter more than age when pursuing bigger creative opportunities.

Digest devoured!

Jun 4

Home

I built a vulnerable app and spent $1,500 seeing if LLMs could hack it

Summary

Deep Dive

Decoder

Original Article

GPT 5.5 - 7/10:

Deepseek V4 Pro - 3/10:

Claude Sonnet 4.6 - 2/10:

Claude Opus 4.8 - 2/10:

Deepseek V4 Flash - 0/10:

Gemini 3.1 Pro Preview - 0/10:

Gemini 3.5 Flash - 0/10:

MiniMax M2.7 - 0/10:

Step 3.7 Flash - 0/10:

GLM 5.1 - 1/4:

Qwen 3.7 Max - 0/6:

Grok Build 0.1 - 0/6:

Minimax M3 - 0/3:

Kimi K2.6 - 1/1:

Owl Alpha - 0/10:

Lessons

Ideogram 4 (GitHub Repo)

Summary

Deep Dive

Decoder

Original Article

Table of Contents

News

Model Zoo

Performance

Design Arena

ContraLabs

LMArena

Ideogram internal eval

Open-source benchmarks

Quick Start

Install

Model access

CLI

Safety screening with Hive

Model Summary

Prompting Guide

Documentation

Citation

We're Hiring!

Intelligence Per Dollar

Summary

Deep Dive

Decoder

Original Article

Morgan Stanley will soon open its trillion-dollar wealth management funnel to AI agents

Summary

Deep Dive

Decoder

Original Article

Morgan Stanley wealth management

SpaceX Sets Price for the World's Largest IPO

Summary

Original Article

sandboxed (GitHub Repo)

Summary

Deep Dive

Decoder

Original Article

sandboxed

What is sandboxed? (start here)

Who's it for?

Why sandboxed?

"Why not just a shell script?"

Quick start

1. Install

2. Have an agent build an app

3. Open the live preview

API

How it works

Configuration

Production / TLS

Uninstall

Is this a good foundation for a startup?

Before you scale hard: what's simple on purpose, and what to harden