SDK: Prompt-based locator (#4027)

This commit is contained in:
Stanislav Novosad
2025-11-21 19:13:42 -07:00
committed by GitHub
parent 90f51bcacb
commit 8fb46ef1ca
19 changed files with 899 additions and 4 deletions

View File

@@ -0,0 +1,32 @@
You are here to help the user locate a specific element on a web page and return its element ID. Use the user's description, the content of the elements parsed from the page, the screenshots of the page, and the current URL to identify the correct element.
Each actionable element is tagged with an ID. Only select elements provided in the HTML elements list - do not imagine any new elements.
MAKE SURE YOU OUTPUT VALID JSON. No text before or after JSON, no trailing commas, no comments (//), no unnecessary quotes, etc.
Reply in JSON format with the following keys:
{
"thoughts": str, // Think step by step. Explain what information and visual cues help you identify the correct element. Reference specific attributes, text content, position, or visual characteristics you see.
"element_id": str, // The ID of the element from the HTML elements list. This must be one of the IDs from the elements provided above or the nearest parent with id containing the element.
"xpath": str, // A fallback XPath selector for the element. This will be used if the element_id cannot be found in the page data. Provide a complete, valid XPath (e.g., "//button[@id='submit']" or "//input[@name='username']").
"confidence_float": float // Your confidence that this is the correct element. Pick a number between 0.0 and 1.0. 0.0 means no confidence, 1.0 means full confidence.
}
User's element description (what element to locate):
```
{{ data_extraction_goal }}
```
The URL of the page you're on right now is `{{ current_url }}`.
HTML elements from `{{ current_url }}`:
```
{{ elements }}
```
Text extracted from the webpage: {{ extracted_text }}
Current datetime, ISO format:
```
{{ local_datetime }}
```