Find what's failing and why: review and issues for AI capabilities

Review: A queue-based workflow for going through AI conversations and describing what went wrong.
Issues: As you review, Axiom matches observations to existing issues or creates new ones. Categories aren’t defined upfront.
Evidence: Each issue collects every conversation assigned to it, along with reviewer notes and links to the full trace.
Signal to action: Feedback, online evaluations, and error traces surface problems. Review and issues help you understand them.

If you’re using Axiom’s AI SDK, you’ve probably got GenAI traces flowing, feedback coming in, maybe online evaluations running. The dashboards might look fine, but when an output goes wrong, what happens? Someone probably spots it, mentions it in Slack, maybe files an issue. The next bad output gets handled differently by someone else. Two weeks later you’re not sure if this is a new problem or one that’s been quietly recurring.

The missing piece is the bit between “that output was bad” and “we understand the problem well enough to fix it.” A domain expert on your team reading a conversation trace can see things no automated scorer will catch, like the agent confirming an action it never actually completed, or referencing a policy that was retired months ago. But those observations need somewhere to go, otherwise they stay in people’s heads and are never codified.

Today we’re releasing Review and Issues for AI capabilities in Axiom to address all this.

How review works

You pick a queue and work through the conversations in it.

Reviewing a conversation: queue, trace viewer, and review panel.

There are five built-in queues:

Flagged for traces your team marked for attention
Recent for latest conversations, for routine checks
Negative feedback showing thumbs-down from users
Errored for traces with exceptions
Reviewed for past decisions you want to revisit

For each conversation, you make a binary call: whether it’s good or bad. Then you write a note about what you saw.

In qualitative research this would be called open coding; observations in your own words without fitting them into predefined categories. Something like “agent claimed upgrade succeeded but tool output shows it didn’t” or “proceeded with return without asking about refund method.” Honest notes. The categorization happens after, and Axiom advances to the next trace when you submit to keep you in flow.

From reviews to issues

When you submit a review, Axiom checks your review against existing issue categories for that capability. If it matches something that’s come up before, the conversation gets added to that issue. New kinds of problems become new issues. Vague reviews get logged but don’t create anything.

You don’t need to define your failure taxonomy in advance. It builds itself from what reviewers write down. Every categorization decision is logged with its reasoning, so you can see why a conversation ended up where it did and correct it if needed.

Issues

Issues view: unresolved categories with reviewed conversations.

After a week of reviews, your issues list might have entries like “confirmed action not completed” with 12 conversations, “skipped required confirmation” with 8, and “referenced data not in tool response” with 3. All of these started as freeform reviews from different people on the team, with Axiom helping you find the patterns.

When you open an issue, you see every conversation assigned to it with the reviewer’s note and a link to the full trace. You might notice that “action not completed” keeps showing up around one specific tool integration, or that “skipped confirmation” is limited to multi-step requests. That tells you where to look and what test cases to write.

If you resolve an issue and new conversations get assigned to it later, it comes back.

Where this fits

This is the last piece we’ve been building toward in the toolkit. You instrument capabilities, run evaluations against test cases, collect feedback in production, score live traffic with online evaluations, and now you can review what needs human attention and track the issues that come out of it.

Get started with review workflows in the Review conversations and Track issues docs.

#LAUNCHEDMetrics are generally available. Logs, traces, metrics, and events in one platform.Learn more→

#PLATFORM

Observability

Distributed traces

Volumetric logging

High-cardinality metrics

Application performance monitoring

Infrastructure monitoring

AI Engineering

AI workflow tracing

AI SDK & telemetry

Long‑term active retention

Evaluation & experimentation

#LATEST

Latest from the blog

#SIGNALS

Features

Logs

Traces

Metrics

AI

#ARCHITECTURE

#TECHNOLOGIES

Technologies

OpenTelemetry

Events API

Vercel & AI SDK

Cloudflare

#INGEST_FROM_ANYWHERE

#CHANGELOG

See what’s new at Axiom

#GET_STARTED

Documentation

Axiom Playground

Axiom CLI

Support

#COMPANY

Blog

Changelog

About us

Careers

#NEWS

From burden to asset: reimagining logs at scale

Find what's failing and why: review and issues for AI capabilities

How review works

From reviews to issues

Issues

Where this fits

More posts

How many bitmaps does it take to beat a gorilla?

Read more→

Introducing Correlations: from symptom to system state

Read more→

Measure less, learn more: Open-source documentation observability from Axiom

Read more→

Get started with Axiom