Build custom evaluation metrics tailored to your chatbot’s specific requirements and use cases.
LM-Based Judge via Web UI | Code-Based Judge via Connector | |
---|---|---|
Best for | Most teams, quick setup, prompt-based evaluation | Advanced users with existing models or complex logic |
Pros | No coding required, easy to iterate, powerful LM reasoning on complex tasks | Maximum flexibility, use existing models, rule-based logic |
Cons | Limited to LM capabilities, may be slower, more expensive | Requires coding, more setup complexity |
Using High-Level Criteria
Using Custom Evaluation Prompt
Install the Connector
Generate Your Snowglobe API Key
Authenticate the Connector
metric_connector.py
) for your custom evaluation logic.