A
data

Arize Phoenix

Open source LLM tracing and evaluation that captures spans scores prompts and outputs, clusters failures and offers a hosted AX service with free and enterprise tiers.
Beginner Level
Free, SaaS tiers by quote
Starting Price
Try Arize Phoenix
Category
data
Setup Time
< 2 minutes
data
Category
Beginner
Difficulty
Active
Status
Web App
Type

What is Arize Phoenix?

See where LLM apps fail and fix them with open source tracing evaluation clustering and an optional hosted service

Arize Phoenix is an open source platform for observing and improving LLM applications. It instruments prompts tool calls and model outputs using OpenTelemetry style traces then lets you cluster conversations and error modes to discover why answers fail. Built in evaluators score hallucination relevance and toxicity, and you can bring custom metrics to compare models or prompt versions. Phoenix runs locally or in your cloud, while a hosted AX service adds retention higher limits Alyx co pilot help and support. Teams use Phoenix to debug retrieval augmented generation pipelines, tune prompts and guardrails, and track regressions during model upgrades. The UI highlights outliers and similar failures so you can fix the root cause instead of guessing. The combination of OSS flexibility and SaaS convenience makes it easy to start small and scale governance as traffic grows.

Key Capabilities

What makes Arize Phoenix powerful

Spans and Context

Capture prompts tools outputs latencies and metadata as traces to understand context and cost across sessions.

Implementation Level Professional

Built in and Custom

Run relevance faithfulness and safety checks or plug your own metrics to compare models and prompts.

Implementation Level Intermediate

Clustering and Search

Group similar failures or topics to find root causes and prioritize fixes instead of chasing anecdotes.

Implementation Level Intermediate

Hosted AX

Use the SaaS plan for higher limits retention multi user access and guided workflows via Alyx.

Implementation Level Basic

Professional Integration

These capabilities work together to provide a comprehensive AI solution that integrates seamlessly into professional workflows. Each feature is designed with enterprise-grade reliability and performance.

Key Features

What makes Arize Phoenix stand out

  • Open source tracing and evaluation built on OpenTelemetry
  • Span capture for prompts tools model outputs and latencies
  • Clustering to reveal failure patterns across sessions
  • Built in evals for relevance hallucination and safety
  • Compare models prompts and guardrails with custom metrics
  • Self host or use hosted AX with expanded limits and support
  • Alyx co pilot to guide debugging in the hosted plan
  • REST and Python SDKs with notebooks and quickstarts

Use Cases

How Arize Phoenix can help you

  • Trace and debug RAG pipelines across tools and models
  • Cluster bad answers to identify data or prompt gaps
  • Score outputs for relevance faithfulness and safety
  • Run A B tests on prompts with offline or online traffic
  • Add governance with retention access control and SLAs
  • Share findings with engineering and product via notebooks
  • Start self hosted then move heavy traffic to hosted AX
  • Educate teams on failure modes with concrete traces

Perfect For

ml engineers data scientists and platform teams building LLM apps who need open source tracing evals and an optional hosted path as usage grows

Pricing

Start using Arize Phoenix today

Free, SaaS tiers by quote

Starting price

Get Started

Quick Information

Category data
Pricing Model Freemium
Last Updated 12/21/2025

Compare Arize Phoenix with Alternatives

See how Arize Phoenix stacks up against similar tools

Frequently Asked Questions

How does pricing start?
Phoenix is open source and free to self host while the hosted AX service lists a free tier with paid upgrades by quote.
Can I use Phoenix without SaaS?
Yes, the core is fully open source and self hostable with no feature gates.
What limits exist in the free hosted tier?
AX Free shows user count span and retention limits that upgrade with paid plans.
Does Phoenix support OpenTelemetry?
Yes, the project embraces OTEL style spans for vendor neutral instrumentation.
Can I bring custom evaluators?
You can plug in custom metrics alongside built in relevance and safety checks.
Is there a co pilot or assistant?
The hosted plan includes Alyx to guide debugging and exploration.
How do I compare prompts or models?
Create experiments to score versions side by side and visualize tradeoffs.
Do you offer enterprise support?
An enterprise plan is available for organizations that require SLAs and private deployments.

Similar Tools to Explore

Discover other AI tools that might meet your needs

Akkio logo

Akkio

data

No code AI analytics for agencies and businesses to clean data, build predictive models, analyze performance and automate reporting with team friendly pricing.

Free trial / Starts $49 per month Learn More
Algolia logo

Algolia

data

Hosted search and discovery with ultra fast indexing, typo tolerance, vector and keyword hybrid search, analytics and Rules for merchandising across web and apps.

Free / Usage based Learn More
Alteryx logo

Alteryx

data

Analytics automation platform that blends and preps data, builds code free and code friendly workflows, and deploys predictive models with governed sharing at scale.

Starts $250 per user per month Learn More
AI21 Labs logo

AI21 Labs

research

Advanced language models and developer platform for reasoning, writing and structured outputs with APIs tooling and enterprise controls for reliable LLM applications.

Free credits / Pay as you go Learn More
Aleph Alpha logo

Aleph Alpha

research

Enterprise AI models and tooling focused on sovereignty, privacy and controllability with on premise options, advanced reasoning and transparency features for regulated users.

By quote Learn More
Anthropic API logo

Anthropic API

coding

Programmatic access to Anthropic models for chat completion tool use and batch jobs with usage based pricing and enterprise controls across regions and clouds.

Usage based, from approx $0.25 per 1M tokens input on Haiku Learn More