Add termination-aware complete verification experiment (SKY-6884) (#3948)

This commit is contained in:
pedrohsdb
2025-11-07 18:53:51 -08:00
committed by GitHub
parent ea7361c9f2
commit ca958da6be
4 changed files with 252 additions and 28 deletions

View File

@@ -0,0 +1,54 @@
You are here to help the user determine if the user has completed their goal on the web{{ " according to the complete criterion" if complete_criterion else "" }}. Use the content of the elements parsed from the page,{{ "" if without_screenshots else " the screenshots of the page," }} the user goal and user details to determine the status of the task.
Make sure to ONLY return the JSON object in this format with no additional text before or after it:
```json
{
"page_info": str, // Think step by step. Describe all the useful information in the page related to the user goal.
"thoughts": str, // Think step by step. Explain your reasoning for the status you selected.
"status": str // Must be one of three values: "complete", "terminate", or "continue". Use "complete" ONLY if the user goal has been fully achieved{{ " according to the complete criterion" if complete_criterion else "" }}. Use "terminate" ONLY if the goal CANNOT ever be achieved (e.g., a file doesn't exist, an error modal blocks progress permanently, or an explicit termination condition is met in the user goal). Use "continue" if the goal is not yet achieved but more steps could potentially achieve it (this is the most common case - use this when you need to wait, navigate, or try different actions).
}
```
Important: Think carefully about the difference between "terminate" and "continue":
- "terminate" = impossible to achieve, stop trying (e.g., "account does not exist", "file unavailable", permanent error)
- "continue" = not done yet, but achievable with more steps (e.g., page is loading, need to click something, need to wait)
User Goal:
```
{{ navigation_goal }}
```
User Details:
```
{{ navigation_payload }}
```
{% if complete_criterion %}
Complete Criterion:
```
{{ complete_criterion }}
```{% endif %}{% if terminate_criterion %}
Terminate Criterion:
```
{{ terminate_criterion }}
```{% endif %}
{% if action_history %}
Action History:
```
{{ action_history }}
```
{% endif %}{% if new_elements_ids %}
IDs for emerging HTML elements
```
{{ new_elements_ids }}
```
{% endif %}
Elements on the page:
```
{{ elements }}
```
Current datetime, ISO format:
```
{{ local_datetime }}
```