Custom Code Evaluator
Sometimes, the default evaluators in Agenta may not be sufficient for your specific use case. In such cases, you can create a custom evaluator to suit your specific needs. Custom evaluators are written in Python, JavaScript, or TypeScript.
Evaluation code
Your custom evaluator should include a function called evaluate with the following signature:
from typing import Dict
def evaluate(
app_params: Dict[str, str],
inputs: Dict[str, str],
output: str,
correct_answer: str
) -> float:
This function should return a float value representing the evaluation score. The score ranges from 0.0 to 1.0, where 0.0 indicates a failed evaluation and 1.0 indicates a perfect score.
The function parameters are:
app_params: A dictionary containing the configuration of the app. This would include the prompt, model and all the other parameters specified in the playground with the same naming.inputs: A dictionary containing the inputs of the app.output: The generated output of the app.correct_answer: The correct answer of the app.
Here's an example implementation of an exact match evaluator:
from typing import Dict
def evaluate(
app_params: Dict[str, str],
inputs: Dict[str, str],
output: str,
correct_answer: str
) -> float:
return 1 if output == correct_answer else 0