OlmoEarth Embeddings Export (8 minute read)

AI2's OlmoEarth Studio now exports pre-computed embedding vectors from satellite imagery that enable similarity search, land-cover mapping, and change detection with minimal training data or compute.

What: OlmoEarth Studio generates compact numerical representations (embeddings) from satellite imagery using open-source foundation models, exportable as standard GeoTIFF files for downstream Earth observation tasks like searching for similar landscapes, segmenting land cover, or detecting changes over time.

Why it matters: The embeddings compress complex multi-spectral satellite data into vectors that already encode rich landscape structure from pretraining, so developers can build geospatial applications with simple linear models instead of training deep neural networks from scratch—a 60-pixel training set achieved 0.84 F1 score for mangrove segmentation.

Takeaway: Try the Colab notebook to run similarity search or few-shot segmentation on satellite data, or access the public models to generate custom embeddings for your region and time period of interest.

Deep dive

OlmoEarth Studio computes embeddings on-demand rather than serving pre-computed archives, so you can specify exact time ranges (1-12 monthly periods) and capture seasonal dynamics instead of just annual snapshots
Three encoder variants offer different trade-offs: Nano (128-dim, 1.4M params), Tiny (192-dim, 6.2M params), and Base (768-dim, 89M params), with Tiny delivering strong performance at lower compute and storage cost
Embeddings are exported as Cloud-Optimized GeoTIFFs with one band per dimension, stored as int8 (-127 to +127) for efficient distribution, then dequantized to floating-point for analysis
Similarity search works by computing cosine similarity between a query pixel and all other pixels—urban areas cluster together, agricultural parcels form distinct groups, with no labels required
Few-shot segmentation with a simple logistic regression on 192-dimensional embeddings produced coherent land-cover maps from just 60 labeled pixels (20 per class) with F1=0.84, and accuracy saturated quickly because embeddings do the heavy lifting
Change detection compares embeddings from two time periods using cosine distance—monthly embeddings from September 2023 vs 2024 immediately highlighted the Park Fire burn scar in California with no training
PCA reduction to three dimensions creates false-color visualizations where similar embeddings get similar colors automatically, revealing landscape structure like crop parcel boundaries without supervision
All examples use frozen embeddings with zero task-specific training, showing the foundation model already learned useful representations, though supervised fine-tuning is available for higher-performance applications
The code is remarkably simple: load the multi-band GeoTIFF with rasterio, reshape to (pixels, dimensions), train sklearn StandardScaler + LogisticRegression on labeled pixels, predict everywhere
Outputs work with standard geospatial tools (QGIS, GDAL, rasterio) and integrate into existing workflows without specialized infrastructure
Global visualization of 1.1M samples shows embeddings cluster by season and land type when reduced with PCA and k-means, demonstrating the model learned meaningful Earth surface patterns during pretraining
Performance depends on input imagery quality—persistent cloud cover, atmospheric artifacts, or missing observations can affect embedding quality, so validation is recommended for each use case

Decoder

Embeddings: Compact numerical vector representations that encode semantic information about data—similar locations get similar vectors, enabling comparison via simple operations like cosine similarity or clustering
Foundation model: A large pre-trained neural network trained on broad data that learns general-purpose representations, which can then be adapted to specific tasks with minimal additional training
COG (Cloud-Optimized GeoTIFF): A standard geospatial raster format optimized for efficient streaming and partial reads over HTTP, widely supported by GIS tools
Sentinel-2 L2A: European Space Agency satellite providing multi-spectral optical imagery at 10-60m resolution with atmospheric correction applied (Level-2A processing)
Sentinel-1 RTC: ESA radar satellite data processed to Radiometric Terrain Correction, which accounts for topographic effects and provides imagery that works through clouds
Linear probe: A standard evaluation technique where you freeze a pre-trained model's representations and train only a simple linear classifier on top, measuring how much task-relevant information the representations already contain
PCA (Principal Component Analysis): Dimensionality reduction technique that finds the directions of maximum variance in high-dimensional data, often used to compress embeddings to 2-3 dimensions for visualization

Original article

Introducing OlmoEarth embeddings: Custom embedding exports from OlmoEarth Studio for downstream analysis

OlmoEarth Studio, our platform for building Earth observation models, now lets you compute and export embedding vectors—compact numerical representations of Earth-observation data produced by our open source OlmoEarth foundation models. The source code and model weights are publicly available alongside the research paper, so the community can inspect exactly how these embeddings are generated.

Embeddings are a fast, cost-effective entry point for leveraging OlmoEarth: they support a wide range of downstream tasks, from similarity search to segmentation to unsupervised exploration. Locations with similar surface characteristics end up with similar vectors; locations that differ land far apart. OlmoEarth embeddings have shown strong performance in our own benchmarking and in independent evaluations. The exported Cloud-Optimized GeoTIFFs (COGs) are lightweight and easy to share. Choose your area of interest, time range, encoder variant, resolution, and imagery sources via the Studio UI or API, and get back a COG you can use however you like. If your application requires higher performance, Studio also supports supervised fine-tuning (SFT).

Custom-computed embeddings are now available for users of OlmoEarth Studio. Reach out if you're interested in gaining access. Instructions for using the publicly available OlmoEarth models to compute your own embeddings are available here.

Computing embeddings in Studio

Global structure in OlmoEarth embeddings from seasonal Sentinel-2 imagery across 1.1M samples. Colors indicate 15 k-means clusters in a PCA-reduced embedding space.

Computing embeddings follows the same workflow as any other prediction in Studio. First configure a model and run it, and then download the results. Several parameters tailor the output:

Area of interest: Draw or upload any polygon; Studio handles imagery acquisition and tiling.
Time span: 1-12 monthly periods.
Encoder variant: Nano (128-dim, 1.4M params), Tiny (192-dim, 6.2M params), or Base (768-dim, 89M params).
Spatial resolution: 10 meter, 20 meter, 40 meter, or 80 meter per pixel.
Imagery sources: Sentinel-2 L2A, Sentinel-1 RTC, or both.

Different visualization options applied to the same embedding raster.

Studio delivers a COG with one band per embedding dimension. Vectors are stored as signed 8-bit integers (int8). Values range from -127 to +127, with -128 reserved for nodata. To recover floating-point vectors, see dequantize_embeddings in olmoearth_pretrain.

Because everything is computed on demand rather than pulled from a pre-computed global archive, your embeddings reflect exactly the conditions you care about. You can generate monthly embeddings to capture seasonal dynamics, not just annual snapshots.

What you can do with OlmoEarth embeddings

The examples below all use OlmoEarth-v1-Tiny (192-dim) embeddings at 40-meter resolution with Sentinel-2 L2A composites (annual for most examples; monthly for change detection). Tiny is a lightweight encoder but still highly performant; for your own applications, you can swap it for a larger variant at the cost of higher compute and storage.

Similarity search: Finding "more like this"

Pick a query pixel, extract its embedding, and compute cosine similarity against every other pixel. The result is a heatmap showing where the landscape looks most and least like your query pixel.

This query sits near the Merced urban center in California. Urban fabric and road corridors light up coherently while agricultural parcels stay dark. The model distinguishes built-up surfaces from cropland without any labels.

Switching the query to a small agricultural window, we define the query vector as the mean of the embedding vectors over that window, then pull Sentinel-2 imagery at the highest- and lowest-similarity locations to see what the model treats as similar and dissimilar.

The most similar patches (0.89 and above) are all agricultural parcels with irrigated fields. The least similar (around zero) are an airport with surrounding bare ground, a reservoir with dry terrain, and arid rangeland. No training data, no labels, just a dot product in embedding space.

Few-shot segmentation: Labeling the landscape

Similarity search tells you "where is it like this?" but sometimes you need discrete labels across a region. Because the representations are already rich, a simple linear classifier can produce a wall-to-wall land-cover map from very few labeled pixels.

To test this, we labeled just 60 pixels (20 per class) over Ca Mau, Vietnam, a coastal mangrove region. Using ESA WorldCover 2021 as the label source for three classes (mangrove, water, other), we randomly sampled 20 pixels per class, trained a logistic regression with per-feature standardization, and predicted every pixel in the region.

From 60 labeled pixels, the classifier produces a coherent map with weighted F1 = 0.84. Mangrove stands, tidal channels, and open water are delineated across the entire region. The classifier saturates quickly: increasing from 30 to 300 labels barely changes accuracy, because the embeddings are doing most of the heavy lifting.

The core of the analysis is a few lines of Python:

import rasterio
import numpy as np
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression

# Load the 192-band embedding COG exported from Studio
with rasterio.open("embeddings.tif") as ds:
    emb = ds.read().astype(np.float32)  # (192, H, W)

C, H, W = emb.shape
X = emb.reshape(C, -1).T  # (H*W, 192)

# Train on labeled pixels, predict everywhere
clf = make_pipeline(StandardScaler(), LogisticRegression(max_iter=2000))
clf.fit(X[train_idx], labels[train_idx])
prediction = clf.predict(X).reshape(H, W)

This is a linear probe, a standard evaluation for foundation models. The fact that a logistic regression over 192 dimensions recovers land-cover boundaries from so few labels means the Tiny encoder has organized these ecological distinctions during pretraining. Larger variants (Base, 768-dim) encode even richer representations.

If you have ground-truth polygons, field survey points, or a coarse existing map, you can train a similar classifier and produce a wall-to-wall map for your own region of interest.

Change detection: Spotting what shifted

Because Studio can generate embeddings at any temporal resolution (monthly through annual), you can compare two time periods directly to identify where surface conditions have changed. Below, we computed monthly Sentinel-2 embeddings for the same region in September 2023 and September 2024 and measured per-pixel cosine distance. The Park Fire (July-September 2024) burn scar in Butte County, California lights up immediately.

No labels or training required—just two embedding COGs and a few lines of Python.

Unsupervised exploration: Seeing what the model sees

Sometimes you have no query location or reference labels. You just want to understand what structure exists in the embeddings. Principal Component Analysis (PCA) is a clean way to do this: reduce to three dimensions, map to R/G/B, and display as a false-color image. Similar embeddings get similar colors automatically.

Flevoland, in the Netherlands, is a reclaimed polder landscape with a regular grid of agricultural parcels. The PCA false-color image reproduces those boundaries with high fidelity. Different crop types, water bodies, and urban areas each get distinct hues. The embedding has internalized landscape structure without ever being told what a parcel or crop is.

This kind of unsupervised view is a quick way to see what structure the model has picked up across your area of interest.

From export to insight

Similarity search, few-shot segmentation, change detection, and PCA exploration are simple operations on standard raster data that run in seconds. The power comes from the embeddings: learned representations that compress earth observation data into vectors capturing rich information about each location from many sensors and millions of training examples.

Custom embedding exports are available now. Create a project, configure an embeddings model, and compute your embeddings. The exported GeoTIFF works with any geospatial tool: QGIS, GDAL, rasterio, or your own scripts. For end-to-end code reproducing the examples in this post, see the embeddings tutorial, which includes working code for similarity search, few-shot segmentation, change detection, and PCA visualization. To get hands-on without any local setup, try the Colab notebook.

Going further: fine-tuning

The examples in this post all use frozen embeddings with no task-specific training. Embeddings are a great entry point for leveraging OlmoEarth: they enable fast, cost-effective generation of results, work well in resource-constrained environments, and are easy to share. For applications that require higher performance, OlmoEarth Studio also supports SFT, training a task-specific model head on your own labels, which typically outperforms linear probes on frozen features.

Limitations

While we are always working to improve our pretraining approaches, it's important to check the quality of the embeddings for your use case using some of the techniques described above. Performance also depends on the quality of the input imagery—persistent cloud cover, atmospheric artifacts, or missing observations in the composite period can affect the resulting vectors.