Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 21 additions & 11 deletions examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,14 +32,15 @@ def helper_function():

### 2. Evaluator (`evaluator.py`)

Your evaluator must return a **dictionary** with specific metric names:
Your evaluator can return either a **dictionary** or an **`EvaluationResult`** object:

```python
def evaluate(program_path: str) -> Dict:
"""
Evaluate the program and return metrics as a dictionary.

CRITICAL: Must return a dictionary, not an EvaluationResult object.
Evaluate the program and return metrics.

Can return either a dict or EvaluationResult object.
Use EvaluationResult if you want to include artifacts for debugging.
"""
try:
# Import and run your program
Expand All @@ -57,13 +58,23 @@ def evaluate(program_path: str) -> Dict:
'combined_score': 0.0, # Always return combined_score, even on error
'error': str(e)
}

# Or use EvaluationResult for artifacts support:
from openevolve.evaluation_result import EvaluationResult

def evaluate(program_path: str) -> EvaluationResult:
return EvaluationResult(
metrics={'combined_score': 0.8, 'accuracy': 0.9},
artifacts={'debug_info': 'useful debugging data'}
)
```

**Critical Requirements:**
- ✅ **Return a dictionary**, not `EvaluationResult` object
- ✅ **Return a dictionary or `EvaluationResult`** - both are supported
- ✅ **Must include `'combined_score'`** - this is the primary metric OpenEvolve uses
- ✅ Higher `combined_score` values should indicate better programs
- ✅ Handle exceptions and return `combined_score: 0.0` on failure
- ✅ Use `EvaluationResult` with artifacts for richer debugging feedback

### 3. Configuration (`config.yaml`)

Expand Down Expand Up @@ -121,18 +132,17 @@ log_level: "INFO"

## Common Configuration Mistakes

❌ **Wrong:** `feature_dimensions: 2`
❌ **Wrong:** `feature_dimensions: 2`
✅ **Correct:** `feature_dimensions: ["score", "complexity"]`

❌ **Wrong:** Returning `EvaluationResult` object
✅ **Correct:** Returning `{'combined_score': 0.8, ...}` dictionary

❌ **Wrong:** Using `'total_score'` metric name
❌ **Wrong:** Using `'total_score'` metric name
✅ **Correct:** Using `'combined_score'` metric name

❌ **Wrong:** Multiple EVOLVE-BLOCK sections
❌ **Wrong:** Multiple EVOLVE-BLOCK sections
✅ **Correct:** Exactly one EVOLVE-BLOCK section

💡 **Tip:** Both `{'combined_score': 0.8, ...}` dict and `EvaluationResult(metrics={...}, artifacts={...})` are valid return types

## MAP-Elites Feature Dimensions Best Practices

When using custom feature dimensions, your evaluator must return **raw continuous values**, not pre-computed bin indices:
Expand Down