Athina AI raises $3M in new funding.
Learn more

Ship AI to prod 10x faster

Hero Circle OneHero Circle OneHero Circle One

Athina is a collaborative AI development platform designed for your team to build, test and monitor AI features.

Trusted by World Class AI teams

Trusted by world class AI teams

Feature List Tab Image
Feature List Tab Image
Feature List Tab Image
Feature List Tab Image
Feature List Tab Image
Evaluate your datasets using 50+ preset evals, or configure custom evals.
Watch demo
Re-generate datasets by changing the model, prompt, or retriever in a few clicks.
Watch demo
Allow your team to verify evaluation results and annotate datasets
Book a demo
Prototype powerful chains and run them programmatically
Book a demo

Collaborate with your entire team

Athina enables both non-technical & technical users to collaborate on experiments, evaluating datasets and managing prompts and flows.

Data Scientists

Service Icon

Work with your Data

Compare datasets side-by-side and interact with your datasets in powerful ways using SQL.

Explore documentation
Watch Demo
Service Item Image

Product Manager

No-code AI Engineering

Build complex AI flows without the engineering complexity

Book a demo
Watch Demo
Service Item Image

QA Team

For your Human QA team

Humans are able to pick up on nuances that automated evals might not be able to.

Athina is designed for human QA teams to work side-by-side with AI evaluations.

Book a demo
Watch Demo
Service Item Image

Engineers

Service Icon

Access everything in just a few lines of code

Everything in Athina works with or without your code.

Engineers are able to run prompts, flows, and evaluations programmatically, while non-technical users can use the UI.

Explore documentation
Watch Demo

import os
from athina.evals import DoesResponseAnswerQuery, ContextContainsEnoughInformation, Faithfulness
from athina.loaders import Loader
from athina.keys import AthinaApiKey, OpenAiApiKey
from athina.runner.run import EvalRunner
from athina.datasets import yc_query_mini
import pandas as pd

from dotenv import load_dotenv
load_dotenv()

# Configure an API key.
OpenAiApiKey.set_key(os.getenv('OPENAI_API_KEY'))


# Evaluate a dataset across a suite of eval criteria
EvalRunner.run_suite(
    evals=[
        RagasAnswerCorrectness(),
        RagasContextPrecision(),
        RagasContextRelevancy(),
        RagasContextRecall(),
        RagasFaithfulness(),
        ResponseFaithfulness(),
        Groundedness(),
        ContextSufficiency(),
    ],
    data=dataset,
    max_parallel_evals=10
)


import os
from athina_client.prompt import Prompt, Slug
from athina_client.keys import AthinaApiKey

AthinaApiKey.set_key(os.getenv('ATHINA_API_KEY'))

Prompt.create_prompt(
    slug='test-staging',
    prompt=[{
        "role": "system",
        "content": "You are an AI that answers questions in less than 50 words"
    },
    {
        "role": "user",
        "content": "what does {{company}} does?"
    }],
    model="gpt-4o",
    commit_message="Initial prompt commit",
    parameters={
        "temperature": 0.5
    }
)


Prompt.run_prompt(
    slug='test-staging',
    # the following fields are optional
    version=2,
    model="gpt-4o",
    variables={
        "company": "NVIDIA"
    },
    parameters={
        "temperature": 1,
        "max_tokens": 1000
    },
)


from athina_logger.api_key import AthinaApiKey

AthinaApiKey.set_api_key(os.getenv('ATHINA_API_KEY'))

response = client.chat.completions.create(
      model='gpt-4-1106-preview',
      messages=[{"role": "user", "content": "What is machine learning?"}],
  )

response = response.model_dump() # For openai > 1 version

try:
  InferenceLogger.log_inference(
      prompt_slug="sdk_test",
      prompt=messages,
      language_model_id="gpt-4-1106-preview",
      response=response,
      external_reference_id="abc",
      cost=0.0123,
      custom_attributes={
          "name": "John Doe"
          # Your custom attributes
      }
  )
except Exception as e:
  if isinstance(e, CustomException):
      print(e.status_code)
      print(e.message)
  else:
      print(e)



import os
from athina_client.keys import AthinaApiKey

AthinaApiKey.set_key(os.getenv('ATHINA_API_KEY'))

from athina_client.datasets import Dataset

try:
  dataset = Dataset.create(
    name='test_dataset',
    description="Optional description", # optional
    language_model_id="gpt-4", # optional,
    rows=[
      { "query": "Who let the dogs out?", "response": "Who, who, who, who, who?" },
      ...
    ]
  )
except Exception as e:
  print(f"Failed to create dataset: {e}")


query GetPromptRuns($limit: Int!, $page: Int!) {
  getPromptRunsByFilters(limit: $limit, page: $page) {
    id
    org_id
    prompt_slug
    language_model_id
    prompt_response
    prompt_tokens
  }
}

Complete visibility into your production AI

Powerful Monitoring, designed for AI

AI traces have different monitoring requirements than traditional applications. Athina is built natively for capturing LLM traces.

Trace every step, everytime

Tracing in Athina captures every step of your LLM flows,  so you can replay what happened at every step of a trace.

Continuous evaluation

Athina's online evaluations can be configured to run on your logs as they come in, so you always have visibility into accuracy.

Segmented Analytics

Understand how model performance changes over time and across different segments

Compare

Analytics in Athina are segmented at every level, so you can compare eval scores by prompt, model, topic or customer ID.

Your data, your rules

Athina ensures full data privacy with fine-grained access controls and deployment in your cloud environment.

Access controls

Configure fine-grained permissions so you can control which users can access different features and data.

Self-hosted Deployments

Deploy Athina entirely in your own VPC.

Complete data privacy.

SOC-2 Type 2 compliant

Athina is compliant with SOC-2 Type 2 standards, ensuring that your data is secure and protected.

Use custom models

Access custom models and providers like Azure OpenAl, AWS Bedrock, and more.

See how teams are accelerating AI development with Athina

Vetted

Athina has a broad set of powerful tools. We use it from prototyping a new idea, to refining it, to monitoring it production. It's a great product: comprehensive and user-friendly. We initially used Athina primarily for logging, but it has become increasingly integrated into our model development and evaluation process.

Andris Pelcbergs & Maria Gaska (Head of AI, Vetted)

Richpanel

We are using Athina at Richpanel to build evals for our Customer Support AI Agents and already love the product. I would recommend it to everyone who wants to build reliably with LLMs.

Ashutosh Dubey, Engineer

CourtCorrect

Me and my team reviewed 10+ frameworks for LLM experimentation and observability. We ended up going with Athina and are very happy with our choice! The experimentation suite is really flexible and integrating our applications and existing observability stack was really smooth. Big + for exposing observability data via API!

Robin Saberi, Head of AI (CourtCorrect)

PhysicsWallah

Have been using Athina from a couple of months. I strongly believe LLM applications in production needs a strong observability and athina fits the bill. Also, seamless prototyping with prompts is also a great feature. Looking forward for a longer collaboration :)

Sandeep Varma, AI Lead

You.com

We've been using Athina AI and it's been saving us so much time with our annotations. Previously, we'd have to curate our datasets amongst different annotators on Google Sheets and it's a massively painful process. With Athina, we're able to curate our datasets (with inter-annotator agreements) much more easily and create much higher quality Evals.

Jason Tang, Staff SWE

Frequently asked questions

Find out more about how Athina works, how to integrate it, and how it can accelerate your AI development process

FAQ Image Shape

Does Athina have a self-hosted deployment option?

FAQ Minus Icon

Yes, Athina can be deployed as a self-hosted image. Contact hello@athina.ai for more information.

Does Athina logging add any latency?

FAQ Minus Icon

Nope, Athina logging can be performed as an async fire-and-forget operation, so it won't impact your latency.

Does Athina support custom evaluations?

FAQ Minus Icon

Yes, Athina enables you to configure custom evaluators. You can use a custom LLM evaluation, write a custom Python function, or even call an external API for evaluation.

Does Athina work with Azure / Vertex / Bedrock?

FAQ Minus Icon

Yes, you can use custom models hosted anywhere using Athina.

How long does Athina take to integrate?

FAQ Minus Icon

You can get set up with logging in just a few minutes. Visit https://docs.athina.ai/logging to get started.

What kind of evaluations does Athina support?

FAQ Minus Icon

Athina supports over 50 preset evaluations from providers like Athina, OpenAI, Ragas, Guardrails, and more. You can also configure custom evaluations using LLM-as-a-judge, or custom python functions.

Pricing

Flexible pricing for teams of every size.

Starter

Free

Get started

10k logs/mo

Advanced analytics

Unlimited prompts

Compare prompts and models

Track cost, latency, and other metrics

Pro

Let's talk

Book a demo

Everything in Starter

Unlimited logs

Unlimited evals

Unlimited datasets

Unlimited team seats

White-glove support

GraphQL API

Enterprise

Custom pricing

Book a call

Everything in Pro

Self-hosted deployment

SOC-2 Type 2 certification

Advanced access controls

Support for custom models

Get started with Athina today

Join the world’s leading teams in building safe, reliable AI systems