Move the code over from private repository (#3)

2024-03-01 10:09:30 -08:00
parent 32dd6d92a5
commit 9eddb3d812
93 changed files with 16798 additions and 0 deletions
--- a/skyvern/forge/prompts/skyvern/extract-action.j2
+++ b/skyvern/forge/prompts/skyvern/extract-action.j2
@@ -0,0 +1,65 @@
+Identify actions to help user progress towards the user goal using the DOM elements given in the list and the screenshot of the website.
+Include only the elements that are relevant to the user goal, without altering or imagining new elements.
+Use the details from the user details to fill in necessary values. Always complete required fields if the field isn't already filled in.
+MAKE SURE YOU OUTPUT VALID JSON. No text before or after JSON, no trailing commas, no comments (//), no unnecessary quotes, etc.
+Each element is tagged with an ID.
+If you see any information in red in the page screenshot, this means a condition wasn't satisfied. prioritize actions with the red information.
+If you see a popup in the page screenshot, prioritize actions on the popup.
+
+{% if "lever" in url %}
+DO NOT UPDATE ANY LOCATION FIELDS
+{% endif %}
+
+Reply in JSON format with the following keys:
+{
+    "actions": array // An array of actions. Here's the format of each action:
+    [{
+        "reasoning": str, // The reasoning behind the action. Be specific, referencing any user information and their fields and element ids in your reasoning. Mention why you chose the action type, and why you chose the element id. Keep the reasoning short and to the point.
+        "confidence_float": float, // The confidence of the action. Pick a number between 0.0 and 1.0. 0.0 means no confidence, 1.0 means full confidence
+        "action_type": str, // It's a string enum: "CLICK", "INPUT_TEXT", "UPLOAD_FILE", "SELECT_OPTION", "WAIT", "SOLVE_CAPTCHA", "COMPLETE", "TERMINATE". "CLICK" is an element you'd like to click. "INPUT_TEXT" is an element you'd like to input text into. "UPLOAD_FILE" is an element you'd like to upload a file into. "SELECT_OPTION" is an element you'd like to select an option from. "WAIT" action should be used if there are no actions to take and there is some indication on screen that waiting could yield more actions. "WAIT" should not be used if there are actions to take. "SOLVE_CAPTCHA" should be used if there's a captcha to solve on the screen. "COMPLETE" is used when the user goal has been achieved AND if there's any data extraction goal, you should be able to get data from the page. If there is any other action to take, do not add "COMPLETE" type at all. "TERMINATE" is used to terminate the whole task with a failure when it doesn't seem like the user goal can be achieved. Do not use "TERMINATE" if waiting could lead the user towards the goal. Only return "TERMINATE" if you are on a page where the user goal cannot be achieved. All other actions are ignored when "TERMINATE" is returned.
+        "id": int, // The id of the element to take action on. The id has to be one from the elements list
+        "text": str, // Text for INPUT_TEXT action only
+        "file_url": str, // The url of the file to upload if applicable. This field must be present for UPLOAD_FILE but can also be present for CLICK only if the click is to upload the file. It should be null otherwise.
+        "option": {  // The option to select for SELECT_OPTION action only. null if not SELECT_OPTION action
+            "label": str, // the label of the option if any. MAKE SURE YOU USE THIS LABEL TO SELECT THE OPTION. DO NOT PUT ANYTHING OTHER THAN A VALID OPTION LABEL HERE
+            "index": int, // the id corresponding to the optionIndex under the the select element.
+            "value": str // the value of the option. MAKE SURE YOU USE THIS VALUE TO SELECT THE OPTION. DO NOT PUT ANYTHING OTHER THAN A VALID OPTION VALUE HERE
+        }
+    }],
+}
+
+{% if action_history %}
+Consider the action history from the last step and the screenshot together, if actions from the last step don't yield positive impact, try other actions or other action combinations.
+{% endif %}
+
+Clickable elements from `{{ url }}`:
+```
+{{ elements }}
+```
+
+User goal:
+```
+{{ navigation_goal }}
+```
+{% if data_extraction_goal %}
+
+User Data Extraction Goal:
+```
+{{ data_extraction_goal }}
+```
+{% endif %}
+
+User details:
+```
+{{ navigation_payload_str }}
+```
+{% if action_history %}
+
+Action results from previous steps: (note: even if the action history suggests goal is achieved, check the screenshot and the DOM elements to make sure the goal is achieved)
+{{ action_history }}
+{% endif %}
+
+Current datetime in UTC:
+```
+{{ utc_datetime }}
+```
--- a/skyvern/forge/prompts/skyvern/extract-information.j2
+++ b/skyvern/forge/prompts/skyvern/extract-information.j2
@@ -0,0 +1,16 @@
+You are given a screenshot, user data extraction goal, the JSON schema for the output data format, and the current URL.
+
+Your task is to extract the requested information from the screenshot and {% if extracted_information_schema %}output it in the specified JSON schema format:
+{{ extracted_information_schema }} {% else %}output in strictly JSON format {% endif %}
+
+Add as much details as possible to the output JSON object while conforming to the output JSON schema.
+
+Do not ever include anything other than the JSON object in your output, and do not ever include any additional fields in the JSON object.
+
+If you are unable to extract the requested information for a specific field in the json schema, please output a null value for that field.
+
+User Data Extraction Goal: {{ data_extraction_goal }}
+
+Current URL: {{ current_url }}
+
+Text extracted from the webpage: {{ extracted_text }}