Most analytics tools can tell you that 4,000 people visited your getting-started guide last month. What they can't tell you is whether any of those people actually got started.
This is the fundamental problem with documentation measurement. Traditional metrics describe activity: pageviews, session duration, bounce rate. They tell you that something happened. They don't tell you whether what happened was useful.
An important use case for Axiom is observability where it helps people understand what their systems are doing and why. I realised I could apply the same principles to documentation to determine whether our docs does its job, which is to get people from confusion to competence as quickly as possible.
So I built Do11y (which is a cute, short name for "documentation observability"), and today we open-source it.
Existing tools vs Do11y
When solving a new problem in software, the first question to ask is whether you should buy an existing solution or build your own. There are plenty of analytics tools available. Some of them are very good at what they do. The problem is that what they do isn't what documentation teams need.
Off-the-shelf analytics tools are built for marketing funnels. They're optimized for tracking conversion paths: visitor lands on page, clicks CTA, enters email, becomes lead. The entire data model is oriented around that progression. When you bolt this onto a documentation site, you end up collecting a lot of data that doesn't help you and missing signals that would.
There is a fundamental difference between marketing and documentation sites. Marketing sites aim to engage people for as long as possible and get them to convert. Documentation sites aim to retain readers' attention for as short as possible and help them achieve their goals using the product. The objectives, criteria of success, and the metrics should be different, yet we often try to use the same tools to measure them.
There was also the privacy question. Most analytics tools collect far more than they need: device fingerprints, cross-site identifiers, data that ends up requiring a cookie consent banner and an entry in your privacy policy. For a documentation site, this felt like the wrong trade-off. I didn't want to know the PII (Personally Identifiable Information) of our readers. I wanted to know whether the docs worked.
What Do11y collects
Do11y treats documentation pages as instrumented software. It captures behavioral events that map to real questions a documentation team would ask:
- Do readers find the right pages? Do11y tracks entry points, traffic sources, and referrer domains, including whether the visitor arrived from an AI platform like ChatGPT, Perplexity, or Claude. This last part turned out to matter more than I expected. A growing share of documentation traffic now arrives via AI-generated answers that link to your docs, and those visitors behave differently from someone who typed a search query.
- Do readers read what we wrote? Scroll depth is a blunt instrument. Do11y also uses
IntersectionObserverto track which sections actually enter the viewport and for how long. The difference between "the user scrolled past the Configuration section" and "the user spent 30 seconds reading the Configuration section" is the difference between a vanity metric and a useful signal. - Where do readers get stuck? If a user opens search on a page, that's often a signal that the page didn't answer their question (unless it's the landing page of the docs and the user came there to search in the first place). If they expand every collapsible section, the content that's hidden might be the content they came for. If they consistently exit the docs from the same page, that page might have a problem (unless they spend a lot of time on that page, work their way through it, and then happyily leave the docs).
- Do readers actually use code examples? Do11y tracks which code blocks get copied and in which language. On a page with tabbed examples for Python, Node.js, and Go, tab switches tell you what your audience's stack actually looks like, which is often different from what you assumed when you chose the default tab.
The full event model includes page views, link clicks, scroll depth, page exits (with active time and engagement ratio), search opens, code copies, section visibility, tab switches, TOC clicks, feedback widget interactions, and expand/collapse toggles. Every event carries a session ID, path, viewport category, browser family, and device type. No cookies, PII, fingerprinting, or cross-site tracking.
Privacy as a design constraint
I want to dwell on this because it shaped every decision in the project.
The tempting approach to documentation observability is to stitch the user journey across docs and product. If you could see that someone read the ingest guide and then successfully sent their first event ten minutes later, you'd have a direct measurement of documentation effectiveness. The problem is that doing this requires identifying users across systems, which is a privacy problem and an engineering problem and a consent problem all at once.
Do11y deliberately avoids this. It uses sessionStorage, which the browser clears when the tab closes. There is no persistent identifier. There is no way to correlate a docs visit with a product action at the individual level. You can still learn a lot from aggregate patterns, how visitors from different sources engage with different pages, which pages have high search rates (a possible confusion signal), where multi-page sessions end. But you learn it without needing to know anything about any individual person.
This constraint turned out to be productive. When you can't track individuals across systems, you stop asking "did this specific user succeed?" and start asking "does this page create the conditions for success?" The second question is more useful anyway. It focuses on the content rather than the person.
You don't need a GDPR consent banner for using Do11y.
What it looks like in practice
The runtime artifact is a single JavaScript file with no dependencies. You add it to your docs site with a few lines of HTML:
<meta name="axiom-do11y-domain" content="AXIOM_DOMAIN">
<meta name="axiom-do11y-token" content="API_TOKEN">
<meta name="axiom-do11y-dataset" content="DATASET_NAME">
<meta name="axiom-do11y-framework" content="FRAMEWORK">
<script src="https://cdn.jsdelivr.net/npm/@axiomhq/do11y@latest/dist/do11y.min.js"></script>Set the FRAMEWORK value to your docs platform and Do11y auto-configures the CSS selectors it needs for search bars, copy buttons, code blocks, navigation, and content areas. It supports Mintlify, Docusaurus, Nextra, MkDocs Material, and VitePress out of the box.
Events stream to Axiom in batches. You can query them with APL immediately. The repository includes a library of example queries covering traffic analysis, engagement scoring, navigation patterns, AI traffic detection, code block usage, and more.
Once you've added Do11y to your docs site and start streaming Do11y events to Axiom, Console automatically creates a pre-built dashboard with charts for all of the above metrics.
Agent-native, with caveats
Do11y classifies referrer domains to detect visits from AI platforms: ChatGPT, Perplexity, Claude, Gemini, Copilot, DeepSeek, and others. Each page_view event includes a referrerCategory field (which might be ai, search-engine, social, direct, and so on) and an aiPlatform field that names the specific platform when applicable.
The honest caveat: this detection is referrer-based, and most AI platforms don't reliably pass referrer headers. Referrer-based detection typically captures 20-40% of AI traffic. The rest shows up as direct. Detecting the "dark AI" traffic would require fingerprinting techniques that conflict with everything else Do11y stands for. So we don't.
Referrer detection is passive: the browser sends it when a user follows a link, it names the origin, and nothing persists on the visitor. Fingerprinting is active. It plants identifiers that cross site boundaries, which is what Do11y avoids.
Even with that limitation, the data is revealing. You can see which pages AI platforms link to most, how AI-referred visitors engage compared to search-referred visitors, and how the ratio changes over time. If you're making decisions about content strategy in 2026, understanding AI as a traffic source is no longer optional.
What I deliberately left out
- Do11y doesn't capture search queries. It knows that someone opened search on a given page, which is a useful signal about the page. But the query text itself could contain anything, including something that looks like PII. So it stays in the browser.
- Do11y doesn't capture any text that the user typed, beyond what appears in UI elements like tab labels and feedback buttons. It doesn't record mouse movements. It doesn't record keystrokes. It doesn't take screenshots.
- Do11y doesn't attempt to correlate documentation visits with product usage at the individual level. As I said above, this is the trade-off that makes the whole privacy model work.
These aren't limitations I plan to fix. They are constraints I chose because the alternative was building a tool I wouldn't want pointed at myself.
The build
Like the placeholder configurator, Do11y started as something I built in the space between documentation work. The source is TypeScript, built with rolldown into JS bundles. The toolchain is intentionally minimal: TypeScript for type checking, rolldown for bundling, oxlint for linting, oxfmt for formatting. No framework, no runtime dependencies.
The test suite is the part I'm most proud of and the part that took the most time. Integration tests spin up actual instances of each supported framework, inject Do11y, drive interactions via headless Chromium, and then query the Axiom API to verify that events arrived correctly. Selector tests run against live production sites to catch framework updates that change the DOM. Query tests validate every APL query in the documentation against the Axiom API.
Selectors are the fragile part. Each docs framework renders its DOM differently, and those rendering choices change between versions. The tests exist to catch drift before it becomes a silent data gap. If you use a heavily customized theme, you may need to provide your own selectors.
Measure less, learn more
The instinct when approaching documentation measurement is to collect everything and figure out what matters later. This instinct is wrong. More data doesn't mean more understanding. It means more noise, more storage, more privacy surface area, and more work before you get to an insight.
The approach that worked was the opposite: start with a question you care about, figure out the minimum signal that could answer it, and collect that. "Which pages have high search rates?" is a better question than "what do people search for?" because it focuses on the content (is this page confusing?) rather than the person (what is this person looking for?). "Which sections do people spend time reading?" is better than "how far do people scroll?" because it gives you something actionable at the section level rather than a number that averages away all the useful variation.
Do11y is the result of this process of subtraction. It collects less than most analytics tools. What it collects is designed to answer specific questions about documentation effectiveness. I think that's the right trade-off for a tool that watches how people read.
Try it
Do11y is open source under the MIT license. The repository is at github.com/axiomhq/do11y, and the package is published as @axiomhq/do11y on npm.
If you maintain a documentation site and you've ever wondered whether your docs are actually working, not just being visited, give it a try. Set it up, let it run for a week, and look at what the data tells you. I think you'll find, as I did, that the most interesting insights come not from the answers but from the questions the data makes you ask.