update custom selection prompt (#799)

2024-09-10 14:12:38 +08:00
parent deb80bce9c
commit ddf2b32b3b
4 changed files with 59 additions and 28 deletions
--- a/skyvern/forge/prompts/skyvern/auto-completion-choose-option.j2
+++ b/skyvern/forge/prompts/skyvern/auto-completion-choose-option.j2
@@ -1,13 +1,14 @@
-There is an input element on a HTML page. Based on the context and information you're provided, you have two goals: 
+There is an input element on an HTML page. Based on the context and information provided, you have two goals:
-    - Confirm if there is an auto completion attempt showing up after the user input the current value.
+  - Confirm if an auto-completion attempt appears after the user inputs the current value.
-    - If available auto completion suggestions show up, help user choose the element that's the most relevant to the input value.
+  - If auto-completion suggestions appear, assist the user in selecting the most appropriate element based on the user’s goal, details, and the context.
-You can confirm auto completion attempt based on the following rules:
+You can confirm an auto-completion attempt based on the following rules:
-    - Several auto completion suggestions show up for the input value. 
+  - Several auto-completion suggestions appear for the input value.
-    - Some messages, like "No results", "No match", also indicate an attempt to give auto completion suggestions.
+  - Although messages like “No results” and “No match” mean no option was matched, they still indicate an attempt to generate auto-completion suggestions.
-Potential auto completion suggesstion could only be:
+You must identify a potential auto-completion suggestion based on the following rules:
-    - Element with ID from "HTML elements". Don't hallucinate any potential option outside "HTML elements".
+  - The option must be an element with an ID from the provided “HTML elements”. Do not create or assume options outside of these elements.
  - The content of the option must be meaningful. Do not consider non-message indicators like “No results” or “No match” as valid options.
 MAKE SURE YOU OUTPUT VALID JSON. No text before or after JSON, no trailing commas, no comments (//), no unnecessary quotes, etc.
 Each interactable element is tagged with an ID.
@@ -15,9 +16,10 @@ Each interactable element is tagged with an ID.
 Reply in JSON format with the following keys:
 {
    "auto_completion_attempt": bool, // True if there's any auto completion attempt based on the rules. Otherwise, it should be False.
-    "reasoning": str, // The reasoning behind the decision. Be specific, referencing input value and element ids in your reasoning. Mention why you chose the element id. Keep the reasoning short and to the point.
+    "reasoning": str, // The reasoning behind the decision. Be specific, referencing the value and the element id in your reasoning. Mention why you chose the element id. Keep the reasoning short and to the point.
    "confidence_float": float, // The confidence of the action. Pick a number between 0.0 and 1.0. 0.0 means no confidence, 1.0 means full confidence.
-    "relevance_float": float, // The relative between the input value and the element. Pick a number between 0.00 and 1.00. 0.00 means no relevance, 1.00 means full relevance, the precision is 0.01.
+    "relevance_float": float, // The relative between the selected element and the provided information. You should consider how much the selected option is related to the user goal, the user details and the context. Pick a number between 0.00 and 1.00. 0.00 means no relevance, 1.00 means full relevance, the precision is 0.01.
    "value": str, // The value to select.
    "id": str, // The id of the most relevant and interactable element to take the action. The id must be from "HTML elements". It should be null if no element is relative or there's no auto completion suggestion.
 }
@@ -31,6 +33,16 @@ Input value:
 {{ filled_value }}
 ```
 User goal:
 ```
 {{ navigation_goal }}
 ```
 User details:
 ```
 {{ navigation_payload_str }}
 ```
 HTML elements:
 ```
 {{ elements }}
--- a/skyvern/forge/prompts/skyvern/custom-select.j2
+++ b/skyvern/forge/prompts/skyvern/custom-select.j2
@@ -1,31 +1,43 @@
-You are doing a select action on HTML page. Help to click the best match element for the target value among HTML elements based on the context.
+You are performing a selection action on an HTML page. Assist the user in selecting the most appropriate option to advance toward their goal, considering the context, user details, and the DOM elements provided in the list.
-You can find the match element based on the following attempts:
+
-  1. Find the semantically most similar element
+You can identify the matching element based on the following guidelines:
-  2. Reconsider if target value is reasonable based on context and the options in the HTML elements. If it doesn't make sense, you can tweak the target value into a reasonable one.
+  1. Select the most suitable element based on the user goal, user details, and the context.
-  3. Find the element, which semantically is the superset of target value. Like "Others", "None of them matched"
+  2. If no option is a perfect match, choose a fallback option such as “Others” or “None of the above”.
-  4. If the field is required, don't leave it blank and don't choose the semantical placeholder value, like "Please select", "-", "Select...".
+  3. If a field is required, do not leave it blank.
  4. If a field is required, do not select a placeholder value, such as “Please select”, “-”, or “Select…”.
  5. Exclude loading indicators like “loading more results” as valid options.
 MAKE SURE YOU OUTPUT VALID JSON. No text before or after JSON, no trailing commas, no comments (//), no unnecessary quotes, etc.
 Each interactable element is tagged with an ID.
 Reply in JSON format with the following keys:
 {
-    "reasoning": str, // The reasoning behind the action. Be specific, referencing target value and element ids in your reasoning. Mention why you chose the element id. Keep the reasoning short and to the point.
+    "reasoning": str, // The reasoning behind the action. Be specific, referencing the value and the element id in your reasoning. Mention why you chose the element id. Keep the reasoning short and to the point.
    "confidence_float": float, // The confidence of the action. Pick a number between 0.0 and 1.0. 0.0 means no confidence, 1.0 means full confidence
    "id": str, // The id of the element to take action on. The id has to be one from the elements list
-    "value": str, // The value to select.
+    "value": str, // The value to select.{% if target_value %}
-    "relevant": bool, // True if the value you select is relevant to the target value, otherwise False.
+    "relevant": bool, // True if the value you select is relevant to the target value, otherwise False.{% endif %}
 }
 Context:
 ```
 {{ context_reasoning }}
 ```
-
+{% if target_value %}
 Target value:
 ```
 {{ target_value }}
 ```
 {% endif %}
 User goal:
 ```
 {{ navigation_goal }}
 ```
 User details:
 ```
 {{ navigation_payload_str }}
 ```
 HTML elements:
 ```
--- a/skyvern/forge/prompts/skyvern/opened-dropdown-confirm.j2
+++ b/skyvern/forge/prompts/skyvern/opened-dropdown-confirm.j2
@@ -1,6 +1,9 @@
-There is a screenshot from part of Web HTML page. Help me confirm it if it's an opened dropdown menu.
+There is a screenshot from a part of a web HTML page. Help me confirm if it is an open dropdown menu.
-An opened dropdown menu could be defined as:
+
- - At least two options show on the screenshot.
+An open dropdown menu can be defined as:
  - At least one option is visible in the screenshot.
  - Do not consider it an open dropdown menu if the only visible option displays a message like “No results” or “No match”.
 MAKE SURE YOU OUTPUT VALID JSON. No text before or after JSON, no trailing commas, no comments (//), no unnecessary quotes, etc.
 Reply in JSON format with the following keys:
--- a/skyvern/webeye/actions/handler.py
+++ b/skyvern/webeye/actions/handler.py
@@ -1120,6 +1120,8 @@ async def choose_auto_completion_dropdown(
            "auto-completion-choose-option",
            context_reasoning=action.reasoning,
            filled_value=text,
            navigation_goal=task.navigation_goal,
            navigation_payload_str=json.dumps(task.navigation_payload),
            elements=html,
        )
        LOG.info(
@@ -1462,27 +1464,29 @@ async def select_from_dropdown(
    prompt = prompt_engine.load_prompt(
        "custom-select",
        context_reasoning=action.reasoning,
-        target_value=target_value,
+        target_value=target_value if not force_select and should_relevant else "",
        navigation_goal=task.navigation_goal,
        navigation_payload_str=json.dumps(task.navigation_payload),
        elements=html,
    )
    LOG.info(
        "Calling LLM to find the match element",
        target_value=target_value,
        step_id=step.step_id,
        task_id=task.task_id,
    )
    json_response = await llm_handler(prompt=prompt, step=step)
    value: str | None = json_response.get("value", None)
    single_select_result.value = value
    LOG.info(
        "LLM response for the matched element",
-        target_value=target_value,
+        matched_value=value,
        response=json_response,
        step_id=step.step_id,
        task_id=task.task_id,
    )
    value: str | None = json_response.get("value", None)
    single_select_result.value = value
    element_id: str | None = json_response.get("id", None)
    if not element_id:
        raise NoElementMatchedForTargetOption(target=target_value, reason=json_response.get("reasoning"))