Ship AI to prod 10x faster

Athina is a collaborative AI development platform designed for your team to build, test and monitor AI features.

Get started for free Watch demo

Trusted by World Class AI teams

Trusted by world class AI teams

Manage, test and run prompts with any model, including custom models

Watch demo

Evaluate your datasets using 50+ preset evals, or configure custom evals.

Watch demo

Re-generate datasets by changing the model, prompt, or retriever in a few clicks.

Watch demo

Allow your team to verify evaluation results and annotate datasets

Book a demo

Prototype powerful chains and run them programmatically

Book a demo

Collaborate with your entire team

Athina enables both non-technical & technical users to collaborate on experiments, evaluating datasets and managing prompts and flows.

Data Scientists

Work with your Data

Compare datasets side-by-side and interact with your datasets in powerful ways using SQL.

Product Manager

No-code AI Engineering

Build complex AI flows without the engineering complexity

QA Team

For your Human QA team

Humans are able to pick up on nuances that automated evals might not be able to.

Athina is designed for human QA teams to work side-by-side with AI evaluations.

Engineers

Access everything in just a few lines of code

Everything in Athina works with or without your code.

Engineers are able to run prompts, flows, and evaluations programmatically, while non-technical users can use the UI.


import os
from athina.evals import DoesResponseAnswerQuery, ContextContainsEnoughInformation, Faithfulness
from athina.loaders import Loader
from athina.keys import AthinaApiKey, OpenAiApiKey
from athina.runner.run import EvalRunner
from athina.datasets import yc_query_mini
import pandas as pd

from dotenv import load_dotenv
load_dotenv()

# Configure an API key.
OpenAiApiKey.set_key(os.getenv('OPENAI_API_KEY'))


# Evaluate a dataset across a suite of eval criteria
EvalRunner.run_suite(
    evals=[
        RagasAnswerCorrectness(),
        RagasContextPrecision(),
        RagasContextRelevancy(),
        RagasContextRecall(),
        RagasFaithfulness(),
        ResponseFaithfulness(),
        Groundedness(),
        ContextSufficiency(),
    ],
    data=dataset,
    max_parallel_evals=10
)


import os
from athina_client.prompt import Prompt, Slug
from athina_client.keys import AthinaApiKey

AthinaApiKey.set_key(os.getenv('ATHINA_API_KEY'))

Prompt.create_prompt(
    slug='test-staging',
    prompt=[{
        "role": "system",
        "content": "You are an AI that answers questions in less than 50 words"
    },
    {
        "role": "user",
        "content": "what does {{company}} does?"
    }],
    model="gpt-4o",
    commit_message="Initial prompt commit",
    parameters={
        "temperature": 0.5
    }
)


Prompt.run_prompt(
    slug='test-staging',
    # the following fields are optional
    version=2,
    model="gpt-4o",
    variables={
        "company": "NVIDIA"
    },
    parameters={
        "temperature": 1,
        "max_tokens": 1000
    },
)


from athina_logger.api_key import AthinaApiKey

AthinaApiKey.set_api_key(os.getenv('ATHINA_API_KEY'))

response = client.chat.completions.create(
      model='gpt-4-1106-preview',
      messages=[{"role": "user", "content": "What is machine learning?"}],
  )

response = response.model_dump() # For openai > 1 version

try:
  InferenceLogger.log_inference(
      prompt_slug="sdk_test",
      prompt=messages,
      language_model_id="gpt-4-1106-preview",
      response=response,
      external_reference_id="abc",
      cost=0.0123,
      custom_attributes={
          "name": "John Doe"
          # Your custom attributes
      }
  )
except Exception as e:
  if isinstance(e, CustomException):
      print(e.status_code)
      print(e.message)
  else:
      print(e)


import os
from athina_client.keys import AthinaApiKey

AthinaApiKey.set_key(os.getenv('ATHINA_API_KEY'))

from athina_client.datasets import Dataset

try:
  dataset = Dataset.create(
    name='test_dataset',
    description="Optional description", # optional
    language_model_id="gpt-4", # optional,
    rows=[
      { "query": "Who let the dogs out?", "response": "Who, who, who, who, who?" },
      ...
    ]
  )
except Exception as e:
  print(f"Failed to create dataset: {e}")


query GetPromptRuns($limit: Int!, $page: Int!) {
  getPromptRunsByFilters(limit: $limit, page: $page) {
    id
    org_id
    prompt_slug
    language_model_id
    prompt_response
    prompt_tokens
  }
}

Complete visibility into your production AI

Powerful Monitoring, designed for AI

AI traces have different monitoring requirements than traditional applications. Athina is built natively for capturing LLM traces.

View logging docs

Trace every step, everytime

Tracing in Athina captures every step of your LLM flows, so you can replay what happened at every step of a trace.

Learn more

Continuous evaluation

Athina's online evaluations can be configured to run on your logs as they come in, so you always have visibility into accuracy.

Learn more

Segmented Analytics

Understand how model performance changes over time and across different segments

Learn more

Compare

Analytics in Athina are segmented at every level, so you can compare eval scores by prompt, model, topic or customer ID.

Learn more

Your data, your rules

Athina ensures full data privacy with fine-grained access controls and deployment in your cloud environment.

Book a demo

Access controls

Configure fine-grained permissions so you can control which users can access different features and data.

Self-hosted Deployments

Deploy Athina entirely in your own VPC.

Complete data privacy.

SOC-2 Type 2 compliant

Athina is compliant with SOC-2 Type 2 standards, ensuring that your data is secure and protected.

Use custom models

Access custom models and providers like Azure OpenAl, AWS Bedrock, and more.

See how teams are accelerating AI development with Athina

Vetted

Athina has a broad set of powerful tools. We use it from prototyping a new idea, to refining it, to monitoring it production. It's a great product: comprehensive and user-friendly. We initially used Athina primarily for logging, but it has become increasingly integrated into our model development and evaluation process.

Andris Pelcbergs & Maria Gaska (Head of AI, Vetted)

Richpanel

We are using Athina at Richpanel to build evals for our Customer Support AI Agents and already love the product. I would recommend it to everyone who wants to build reliably with LLMs.

Ashutosh Dubey, Engineer

CourtCorrect

Me and my team reviewed 10+ frameworks for LLM experimentation and observability. We ended up going with Athina and are very happy with our choice! The experimentation suite is really flexible and integrating our applications and existing observability stack was really smooth. Big + for exposing observability data via API!

Robin Saberi, Head of AI (CourtCorrect)

PhysicsWallah

Have been using Athina from a couple of months. I strongly believe LLM applications in production needs a strong observability and athina fits the bill. Also, seamless prototyping with prompts is also a great feature. Looking forward for a longer collaboration :)

Sandeep Varma, AI Lead

You.com

We've been using Athina AI and it's been saving us so much time with our annotations. Previously, we'd have to curate our datasets amongst different annotators on Google Sheets and it's a massively painful process. With Athina, we're able to curate our datasets (with inter-annotator agreements) much more easily and create much higher quality Evals.

Jason Tang, Staff SWE

Frequently asked questions

Find out more about how Athina works, how to integrate it, and how it can accelerate your AI development process

Does Athina have a self-hosted deployment option?

Yes, Athina can be deployed as a self-hosted image. Contact hello@athina.ai for more information.

Does Athina logging add any latency?

Nope, Athina logging can be performed as an async fire-and-forget operation, so it won't impact your latency.

Does Athina support custom evaluations?

Yes, Athina enables you to configure custom evaluators. You can use a custom LLM evaluation, write a custom Python function, or even call an external API for evaluation.

Does Athina work with Azure / Vertex / Bedrock?

Yes, you can use custom models hosted anywhere using Athina.

How long does Athina take to integrate?

You can get set up with logging in just a few minutes. Visit https://docs.athina.ai/logging to get started.

What kind of evaluations does Athina support?

Athina supports over 50 preset evaluations from providers like Athina, OpenAI, Ragas, Guardrails, and more. You can also configure custom evaluations using LLM-as-a-judge, or custom python functions.