Creating a Custom LLM Metric
Custom LLM Metrics allow you to evaluate your AI agent’s performance against criteria you define. Instead of relying only on built-in metrics, you can write your own metric prompts so Snowglobe’s evaluation system measures what matters most to your use case.Step 1. Start in the Metrics Dashboard
Go to Snowglobe Metrics and click Add Custom Metric.This will open the metric editor where you can configure and test your new metric.
Step 2. Name the Metric
Give your metric a name.This acts as an identifier, so keep it simple and descriptive (e.g.,
factual_accuracy, tone_politeness, sales_closure_success).
Step 3. Write a Description
Write a short description of what your metric should evaluate. This helps collaborators understand its purpose. For example:This metric evaluates whether the AI agent’s responses remain polite and professional, even when the user is frustrated or rude.The description is human-readable only and doesn’t affect how Snowglobe evaluates—it’s just documentation.
Step 4. Generate and Edit the Metric Prompt
Click Generate Prompt to automatically create a starting point for your metric prompt.The generated prompt will include:
- A description of the criteria to judge
- Instructions for how the model should score the conversation
Step 5. Choose your Metric Model
Select the LLM that Snowglobe should use to evaluate this metric. We recommend using bigger models like gpt-4o or gpt-5 if your criteria are sophisticatedStep 6. Use your custom metric
Once you’re happy with your metric prompt, click Save Metric.Your new metric will now appear in the list of available metrics when you create a simulation. We recommend that you review your custom metric’s performance after running a simulation to ensure it’s working as expected. You can always edit the metric later if needed.