- Classic consistent hashing forces a tradeoff: Hash by series for even writes, or by metric for fast reads. You can't have both.
- Subrings turn the dial: Nesting a small ring inside the main ring for each metric lets you tune the balance between write distribution and read locality.
- Dynamic sizing removes the guesswork: Each subring grows or shrinks based on the actual number of series. A 1,000-series metric and a 1,000,000-series metric each get the right spread automatically.
- Fault tolerance comes free: The subring model extends the ring's existing failover semantics. Resilience and load balancing are the same mechanism.
- This powers MetricsDB at scale: The approach lets Axiom ingest and query metrics across a multi-tenant fleet without per-metric bottlenecks or fan-out storms.
Axiom's MetricsDB stores and queries metrics for customers operating at very different scales, sometimes on the same cluster. The engineering challenge behind it is a distribution problem: how do you spread metric data across a fleet of nodes so that writes are balanced, reads are fast, and the system stays resilient when nodes fail — all at the same time?
The foundation is consistent hashing, a strategy to distribute load fairly in a distributed system. The short version is that for each entity you have, you compute a hash, you assign each node in your distributed system a part of the total hash space, and send the entity to the node the computed hash belongs to. We call this a "ring."
This method is widely used and works extremely well if the number of entities is high and they're reasonably uniform in size. In this case, we get a fair distribution of load.
The tradeoff that started it all
As MetricsDB is effectively a distributed database, we've looked into using a similar approach, but this poses a few problems. Let's look at the naive implementation. We identify each entity as the dataset plus the metric that is recorded with all the tags associated with it. For example prod/cpu[host="bob", core=7]. This would spread load evenly over all the nodes and create a very efficient situation on the write path.
However (there is always a however, isn't there?), on the read path this is quite terrible. A query usually spans a single metric and often large portions of the series in a metric. Considering that we're operating a multi-tenant system at large scale, this would mean that every query running would need to touch every single database node. This isn't scalable. In addition, spreading a metric over all nodes means we lose the compression advantage that data locality gives us. Compressing the metrics of a single series together yields a much higher compression ratio than doing it individually. For Axiom customers, that fan-out would translate directly into slower dashboards and higher query latency.
Back to the drawing board. If we look at the situation from the query side, it'd be ideal to have each metric on a single node. Our entity becomes the dataset and the metric, and ignores the tags. This means all data for a metric lands on a single database node, which is ideal for both compression and query.
However (have I said there is always a however?), now this is terrible on the write path. The formerly even distribution provided by a large number of uniformly sized entities is ruined. We now have a small number of unevenly sized entities (we're talking about size differences in the 1000x range, not 2x range). This also poses scalability issues. The number of series a metric can contain is now limited by the size of a single node. In a multi-tenant platform, that ceiling would eventually become a blocker for customers with high-cardinality workloads.
Resiliency to the rescue
Let's take a quick detour here. For ring-based systems, it's quite common to handle failures to send data that belongs to a failed node to the next node in the ring. The naming of "ring" starts to make more sense here. When we reach the last node, the next one will be the first node. They connect. For example, we have 8 nodes in a ring. If node 3 has an issue, node 4 takes its load. If node 8 fails, node 1 takes its load. And so on.
There is an observation to be made here: we already deal with more than a single node. We always have to consult the node and its neighbor.
What if we always send data to the failover node, partition by dataset and metric, and then split that evenly over the two nodes? But then what do we do if the secondary node fails? Pass its load to the next node again? That way, we're back to having to ask all the nodes on the read path... Not good.
However! (There is always a however, isn't there? But this time, it's a good however.) What if we treated the node and the failover node as a ring? A ring centered around the node that the dataset-metric combination hashes to? We'd have the primary node, the one that the dataset-metric combination hashes to, and the failover node as a second member of the ring. If the primary node fails, its part of the dataset-metric data goes to the secondary. If the secondary fails, its part of dataset-metric data goes to the tertiary. (Side note: the secondary node will itself be the primary node for other dataset-metric combinations).
This way, we create a large ring that hosts a subring for every dataset-metric combination. And with that realization, we're no longer limited to a subring size of two. Why should we have only one failover? Let's have 3! More resilience and better balancing.
We now have a knob we can turn to adjust our tradeoffs. Going back to the original problem, we've already modeled the two extreme cases:
- Subring size equals the ring size: We hash on dataset-metric-tag combinations. This gives near-perfect distribution but is really bad for reads.
- Subring size of 1: We only hash on dataset-metric combinations. This is really good for reads but terrible for writes.
Now we can pick something between the two extremes to find a tradeoff that works for us.
Choice is hard, or is it?
What's the ideal size for a subring? This is where it gets complicated, because there isn't one. A metric that holds 1,000 series has a different ideal subring-size than a metric that holds 1,000,000 series. If we picked a subring size of 5, it might be too big for 1,000 series and too small for 1,000,000 series. If we go bigger, we make it worse for the 1,000-series metric. If we go smaller, we make it worse for the 1,000,000-series metric.
On the bright side, we've moved the original problem well ahead. The subring size of 5 will be better than both extremes of "all" or "one."
In the first paragraph, we had a ring with a large number of similarly sized entries and we got a near perfect distribution. We don't have this situation on the ring anymore, but we do have it on the subring. Hashing by the tags gives every entry around the same size, and if we have enough series, we also have a large number of entries.
Why is this important? Once a metric has enough series to require a larger subring, the properties of hashing mean that every node in the subring has about the same number of series. This also means every node can calculate the approximate total number of series in the metric.
With this information, the primary node of each subring can make a decision on how large the ideal subring size for a given dataset-metric combination is. As the primary node exceeds a defined upper threshold of series, it can grow the subring. If it falls below the lower threshold, it can shrink it. This is now simple math: new_subring_size = current_series * current_subring_size / ideal_series_per_node. Space the upper and lower threshold appropriately to prevent oscillation, and we create a very stable system that provides a near-ideal utilisation irrespective of the number of series in a metric.
In other words: we can have our cake and eat it too.
Try it out
If you want to play around with this idea, here is a simple simulator.
Each color represents a different metric type. The hash key is the metric name, which determines the subring (that is, a set of consecutive nodes of size k). Tags are hashed into the selected subring.
Use the Ingest rate slider to start sending data. Play with different distributions by changing the number of tags in the Metric palette, tune the Subring size, or set it to Auto to let the primary node decide.
Why this matters for Axiom
MetricsDB serves a multi-tenant fleet where one customer might send a handful of low-cardinality system metrics and the next sends millions of high-cardinality application series. A static hashing strategy forces you to pick a point on the write-balance vs read-efficiency spectrum and live with the consequences. Dynamic subrings remove that constraint.
- Faster queries at every scale: Keeping a metric's data on a small, bounded set of nodes means queries touch only the nodes that matter. Dashboards, alerts, and ad-hoc explorations all benefit from reduced fan-out.
- Better compression, lower cost: Data locality within a subring means series belonging to the same metric are compressed together, yielding significantly higher compression ratios than scattering them across the cluster. That translates directly into lower storage costs.
- Resilience without overhead: The subring model doesn't bolt fault tolerance on as an afterthought. It extends the ring's existing failover semantics. When a node fails, its share of each subring shifts to the next neighbor. No special recovery path, no coordination storm.
- No per-metric ceilings: Because subrings grow with the data, no single metric is bottlenecked by the capacity of a single node. Customers can scale their workloads without hitting invisible walls.
What's next
The subring model is running in production today, powering MetricsDB's write and query paths. We're continuing to refine the thresholds and explore how the same principle applies to other parts of the storage layer.
To see what scalable metrics infrastructure looks like in practice, sign up for Axiom for free or explore the Axiom Playground.