refactor custom-select/auto-complete context (#830)

2024-09-14 17:28:08 +08:00
parent 6571604fa5
commit 8c2a733ba2
6 changed files with 86 additions and 53 deletions
--- a/skyvern/forge/prompts/skyvern/custom-select.j2
+++ b/skyvern/forge/prompts/skyvern/custom-select.j2
@@ -1,4 +1,4 @@
-You are performing a selection action on an HTML page. Assist the user in selecting the most appropriate option to advance toward their goal, considering the context, user details, and the DOM elements provided in the list.
+You are performing a {{ "multi-level selection" if select_history else "selection" }} action on an HTML page. Assist the user in selecting the most appropriate option to advance toward their goal, considering the context, user details, and the DOM elements provided in the list.

 You can identify the matching element based on the following guidelines:
  1. Select the most suitable element based on the user goal, user details, and the context.
--- a/skyvern/forge/prompts/skyvern/extract-action.j2
+++ b/skyvern/forge/prompts/skyvern/extract-action.j2
@@ -17,8 +17,6 @@ Reply in JSON format with the following keys:
        "reasoning": str, // The reasoning behind the action. Be specific, referencing any user information and their fields and element ids in your reasoning. Mention why you chose the action type, and why you chose the element id. Keep the reasoning short and to the point.
        "confidence_float": float, // The confidence of the action. Pick a number between 0.0 and 1.0. 0.0 means no confidence, 1.0 means full confidence
        "action_type": str, // It's a string enum: "CLICK", "INPUT_TEXT", "UPLOAD_FILE", "SELECT_OPTION", "WAIT", "SOLVE_CAPTCHA", "COMPLETE", "TERMINATE". "CLICK" is an element you'd like to click. "INPUT_TEXT" is an element you'd like to input text into. "UPLOAD_FILE" is an element you'd like to upload a file into. "SELECT_OPTION" is an element you'd like to select an option from. "WAIT" action should be used if there are no actions to take and there is some indication on screen that waiting could yield more actions. "WAIT" should not be used if there are actions to take. "SOLVE_CAPTCHA" should be used if there's a captcha to solve on the screen. "COMPLETE" is used when the user goal has been achieved AND if there's any data extraction goal, you should be able to get data from the page. Never return a COMPLETE action unless the user goal is achieved. "TERMINATE" is used to terminate the whole task with a failure when it doesn't seem like the user goal can be achieved. Do not use "TERMINATE" if waiting could lead the user towards the goal. Only return "TERMINATE" if you are on a page where the user goal cannot be achieved. All other actions are ignored when "TERMINATE" is returned.
-        "field_information": str, // The target field for the action. Only for INPUT_TEXT and SELECT_OPTION actions. Otherwise it should be null.
-        "required_field": bool, // True if it's a required field, otherwise false.
        "id": str, // The id of the element to take action on. The id has to be one from the elements list
        "text": str, // Text for INPUT_TEXT action only
        "file_url": str, // The url of the file to upload if applicable. This field must be present for UPLOAD_FILE but can also be present for CLICK only if the click is to upload the file. It should be null otherwise.
--- a/skyvern/forge/prompts/skyvern/parse-input-or-select-context.j2
+++ b/skyvern/forge/prompts/skyvern/parse-input-or-select-context.j2
@@ -0,0 +1,21 @@
+You are a browser agent performing actions on the web. You are instructed to take an INPUT or SELECT action on the element(id: "{{ element_id }}"). Extract some detailed information from the context/reasoning, and double-check the information by analysing the HTML elements.
+
+MAKE SURE YOU OUTPUT VALID JSON. No text before or after JSON, no trailing commas, no comments (//), no unnecessary quotes, etc.
+
+Reply in the following JSON format:
+{
+    "thought": str, // A string to describe how you double-check the information to ensure the accuracy.
+    "field": str, // Which field is this action intended to fill out?
+    "is_required": bool, // True if this is a required field, otherwise false.
+}
+
+Existing reasoning context:
+```
+{{ action_reasoning }}
+```
+
+HTML elements:
+```
+{{ elements }}
+```
+