Devoured - April 23, 2026
Columnar Storage is Normalization (3 minute read)

Columnar Storage is Normalization (3 minute read)

Data Read original

A conceptual reframing of columnar databases as extremely normalized row stores where each column is its own table joined by ordinal position.

What: The article presents a mental model for understanding columnar storage by viewing it through the lens of database normalization, where splitting data by columns is essentially creating separate tables (one per attribute) that join on an implicit primary key represented by array position.
Why it matters: This perspective unifies traditional relational database operations like joins and projections with data format manipulation, making columnar storage more intuitive for developers familiar with normalization rather than treating it as a completely separate encoding concept.
Takeaway: When reasoning about columnar database performance, think of row reconstruction as a join operation across multiple single-attribute tables to better understand the tradeoffs between column-scans and row-fetches.
Deep dive
  • Row-oriented storage keeps all attributes for a record together, making row insertion and full-row retrieval fast but requiring reading unnecessary data when querying specific columns
  • Column-oriented storage groups values by attribute across all records, optimizing column-specific operations like aggregations but making row reconstruction more expensive
  • Columnar storage can be conceptually modeled as extreme normalization where each column becomes a separate two-column table (id + value)
  • The primary key in this normalized view is the ordinal position in the array, which is implicit rather than stored
  • Reconstructing a row from columnar storage is literally performing a join operation across all column tables on their ordinal positions
  • This mental model bridges the gap between thinking about storage formats as low-level encoding details versus high-level relational operations
  • Query engines typically treat row vs column orientation as transparent implementation details, observable only through performance characteristics
  • Understanding this equivalence helps unify query processing operations (projections, joins) with physical data layout decisions
Decoder
  • Columnar storage: Database format that stores values from the same column together rather than storing complete rows together
  • Row-oriented storage: Traditional database format where all attributes of a record are stored adjacent to each other on disk
  • Normalization: Database design technique of splitting tables to reduce redundancy, typically separating attributes into different related tables
  • Projection: Database operation that selects specific columns from a table while discarding others
Original article

This post reframes column stores as simply normalized row stores.