The Sortable interface: Teaching every column type to sort itself

Hassan EzzeldeenSoftware Engineer
Mano TothSenior Technical Writer
Tomás SenartPrincipal Engineer
April 3, 2026
  • Generic sorting ignores structure: Go's built-in sort is great at general-purpose sorting, but blind to how columnar data is actually stored
  • Low-cardinality columns win big: Enum and bitmap columns hit 26x and 19.6x speedups with counting sort by exploiting their bounded value sets
  • Trivial structure, trivial sort: Boolean columns with only three possible values jumped 8.8x using a single-pass three-way partition
  • One interface, ten algorithms: A clean Sortable interface lets each column type run the algorithm that fits it, while the query engine calls one method and moves on
  • Every column benefited: Even the hardest case (mixed-type variant data) improved by 2x

There is a class of engineering problem that sounds straightforward until you actually look at it. Sorting is one of them.

When your database needs to sort query results, the naive answer is: call sort. That works. Go's standard library includes an general-purpose sort. It's battle-tested, and it's almost certainly not the right tool for a columnar database.

The problem isn't that the algorithm is wrong. The problem is that the algorithm knows nothing about the data it's sorting. It sees a list of rows and a comparator function. It has no idea that the column it's sorting contains at most fifteen distinct values. It doesn't know that boolean fields can only ever be true, false, or null. It can't see that every string in the column has already been deduplicated into a dictionary of a few thousand entries. All that carefully maintained and precisely encoded structure gets completely ignored. The sort starts from scratch every time.

This is the story of the Sortable interface: a project that gave each column type the ability to sort itself, exploiting its own internal structure to replace Ω(n * log(n)) comparisons with algorithms that fit the data.

The result: speedups between 2x and 26x across every column type in our engine, on every query that touches an order by or a topk operation.

The problem with one-size-fits-all

Our query engine processes data in fixed-size blocks. Every time a query requests sorted output (an order by clause, a topk aggregation, anything that requires ranked results) the engine needs to sort the rows in those blocks by one or more column values.

Before this project, sorting worked the same way for every column type: Go's slices.SortFunc. You hand it a slice of row indices and a comparison function. The comparison function calls Value() on each element to materialize a generic value, then compares them. Simple, correct, and deeply unaware of what it was comparing.

For any column that stores data in a structured way, this meant throwing that structure away on every single comparison. A column that encodes enum values as small integers had to re-materialize those integers into generic values, then compare them generically. A column that stores strings in a dictionary had to resolve the dictionary entry on every comparison, then do a full string comparison, even though the dictionary only contains a handful of distinct entries and could have been sorted once and reused.

The comparison-based lower bound is Ω(n * log(n)). That bound holds for arbitrary data. But our data isn't arbitrary.

One interface, ten algorithms

The fix was an interface. Specifically, a Sortable interface that column types can implement:

type Sortable interface {
    Sort(matches []int32, desc bool) (ties [][2]int)
    Select(matches []int32, k int, desc bool) (tieStart, tieEnd int)
}

Sort takes a slice of row indices and sorts them in place. It returns tie ranges (spans where consecutive values are equal) so multi-column sorts can refine ordering within those spans. Select is the partial-sort variant for topk: find the k-th element and return the boundary of the tie range around it, without sorting the rest.

The interface is deliberately minimal. Behind that single interface, each column can run whatever algorithm best fits its structure. The query engine doesn't need to know. It checks whether the column implements Sortable; if it does, it calls Sort and gets back a correctly ordered result. What happened in between is the column's business.

A dozen column types now implement this interface. Each chose a different algorithm.

Bounded values, trivial sorts

The biggest wins came from the simplest observation: some columns have a bounded number of possible values. When you know that, comparison-based sorting isn't just slow. It's the wrong class of algorithm entirely.

EnumColumn stores categorical fields like log levels, HTTP methods, and status codes. These columns have at most 15 distinct values. Sort that small dictionary once to establish rank order, then do a single-pass counting sort using those ranks as bucket keys. Counting sort runs in O(n + k) time. With k ≤ 15, k is a rounding error. 26x average speedup.

BitmapColumn stores set-valued fields like tags and feature flags. These columns have a bounded number of distinct sets. Same strategy: sort the set labels lexicographically, assign ranks, counting sort by rank. 19.6x average speedup.

BooleanColumn can only ever contain true, false, or null. Three possible values. There is a classical algorithm for exactly this: the Dutch National Flag algorithm. Three pointers scan the array once, partitioning elements into zones: [null | false | true] for ascending, reversed for descending. One pass, no comparisons, no materializing values. When there are no nulls, it degenerates to an even simpler two-way partition. 8.8x average speedup.

The first enum benchmark came back at 26x and we re-ran it, assuming something was broken. Same number. That's when the scope of the project became clear: we weren't replacing a bad algorithm. We were replacing a general one with specific ones. And every column type in our engine has structure the general algorithm was ignoring.

The jump from "arbitrary values" to "at most 15 distinct values" doesn't save a constant factor. It changes the algorithm class. Comparison-based sorting has an n log n floor. Counting sort doesn't. And a significant part of the win isn't just algorithmic: counting sort on an enum column materializes nothing. Nearly all bookkeeping is stack-allocated integers. At 65,000 rows per block, eliminating per-row allocations matters enormously under load.

Structure hiding in plain sight

Not every column has fifteen values. But every column has structure, and the sort can use it.

IntColumn and FloatColumn store dense numeric arrays. Radix sort distributes elements into buckets by digit, pass by pass, achieving O(n) time with no comparisons at all. For floats, IEEE 754 numbers are bit-manipulated into a sort-order-preserving unsigned integer representation: negative floats get all bits flipped, positive floats get only the sign bit and then sorted as integers. Both implementations fuse a sortedness detection pass with the initial key-building pass: if the data is already sorted, the algorithm exits immediately. For small inputs, they fall back to pattern-defeating quicksort, which beats radix sort at small sizes due to cache behavior. IntColumn: 3.7x. FloatColumn: 3.8x.

StringColumn stores text using dictionary encoding: a compact dictionary of unique values, with each row holding an index into that dictionary. A column with 65,000 rows might have only a few thousand unique strings. Sort the dictionary once meaning O(d * log(d)) where d is the number of distinct values. Assign each entry a numeric rank, then radix-sort the row indices by rank. Ω(n * log(n)) string comparisons become O(d * log(d) + n) integer operations. For lower-cardinality fields like hostnames or service names, where d is much smaller than n, the improvement is dramatic. For small match sets where dictionary-building overhead isn't amortized, the implementation falls back to direct comparison sort. 3.4x average speedup.

slices.SortFunc is excellent. It's what you reach for when you don't know anything about your data. But the whole point of a columnar storage format is to colocate and compress data by type. If your sorting code ignores that organization, you're leaving performance on the table.

Compose, don't rewrite

Some columns don't have a single clean structure to exploit. They have a structural problem to isolate before the real sort can begin.

Sparse columns, that is, optional fields like error details or debug metadata, store only non-null values compactly, with a bitmask tracking which rows have data. The Sortable implementation partitions nulls to one end using the bitmask (a single O(n) scan), then delegates sorting of the non-null portion to the underlying type's algorithm. Sparse integers use radix sort. Sparse floats use the IEEE 754 approach. Sparse strings use a direct comparison sort on the compacted non-null entries. Null handling, which used to pollute every comparison in the generic path, is handled once and completely isolated. Sparse[int64]: 5.3x. Sparse[float64]: 5.1x. Sparse[string]: 4.6x.

VariantColumn handles schemaless data: a field where some rows contain strings, others numbers, others objects. The implementation partitions rows by type using a single counting-sort pass, sorts each partition using its sub-column's own Sortable implementation, then merges the results. The offset-translation between parent and sub-column indices introduces overhead, and there's room to optimize it further. Even so: 2x average speedup.

This is where the interface design paid off. The Sortable contract let us change a dozen sorting implementations independently, behind a single contract. Composite columns delegate to their children without knowing what algorithm those children chose. Adding Select for partial sorting alongside Sort meant that topk queries could be optimized in the same framework, without changing the query engine at all.

The full picture

Column typeAlgorithmAverage speedup
EnumColumnCounting sort (k ≤ 15)26.0x
BitmapColumnCounting sort (bounded k)19.6x
BooleanColumnDutch National Flag8.8x
Sparse[int64]Null partition + radix sort5.3x
Sparse[float64]Null partition + radix sort5.1x
Sparse[string]Null partition + dict-rank4.6x
FloatColumnRadix sort + IEEE 754 encoding3.8x
IntColumnRadix sort (multi-strategy)3.7x
StringColumnDict-rank + radix sort3.4x
VariantColumnType-partition + delegate2.0x

Benchmarks run with -benchmem -count=2 against 65K-row blocks on Apple M4.

Every column type improved. The spread from 26x to 2x reflects how much exploitable structure each type has. Enum columns have a great deal of structure; variant columns have the least. But a 2x improvement on the hardest case is still a 2x improvement.

What this teaches

A few things became clear during this project that are worth stating plainly.

Generic algorithms aren't wrong. They're just general. slices.SortFunc is excellent. It's what you reach for when you don't know anything about your data. When you do know something about your data, you should use that knowledge. The whole point of a columnar storage format is to colocate and compress data by type. If your sorting code ignores that organization, you're building an expensive system and leaving performance on the table.

Bounded cardinality is extremely powerful. The jump from "arbitrary values" to "at most 15 distinct values" doesn't just save a constant factor. It changes the algorithm class entirely. Comparison-based sorting has a log n floor. Counting sort doesn't. If you know your cardinality is bounded, counting sort wins, and it wins by more the larger your data gets.

Interface design matters for algorithms. The Sortable interface gave us a way to change ten sorting implementations independently, behind a single contract. Adding Select for partial sorting alongside Sort meant that topk queries (which only need the top k elements, not a full sort) could be optimized in exactly the same framework, without changing the query engine at all.

Small allocations accumulate. A significant part of the speedup isn't just a better algorithm; it's eliminating allocations. Generic comparison-based sorting materializes values on every comparison. Counting sort on an enum column materializes nothing: all bookkeeping is stack-allocated integers. At 65,000 rows per block, eliminating per-row allocations matters enormously under load.

What's next

One direction we're exploring is offloading sorting from the browser to the API. With the engine now sorting this fast, operations that previously made sense to do client-side become good candidates for the server. This means faster, more responsive results for every Axiom user, at any data scale.

If you want to see what fast sorting feels like in practice, sign up for Axiom for free or explore the Axiom Playground.

Share:

Interested to learn more about Axiom?

Sign up for free or contact us at sales@axiom.co to talk with one of the team about our enterprise plans.

Get started with Axiom

Learn how to start ingesting, streaming, and querying data into Axiom in less than 10 minutes.