add actions db model and caching V0 (#980)

2024-10-15 12:06:50 -07:00
parent e7583ac878
commit 9048cdfa73
19 changed files with 731 additions and 90 deletions
--- a/skyvern/forge/prompts/skyvern/answer-user-detail-questions.j2
+++ b/skyvern/forge/prompts/skyvern/answer-user-detail-questions.j2
@@ -0,0 +1,25 @@
+You will be given information about a user's goal and details. 
+
+Your job is to answer the user's questions based on the information provided.
+
+The user's questions will be provided in JSON format.
+
+Your answers should be direct and to the point. No need to explain the answer.
+
+Your response should be in JSON format. Basically fill in the answer part and return the JSON.
+
+User's goal: {{ navigation_goal }}
+
+User's details: {{ navigation_payload }}
+
+User's questions: {{ queries_and_answers }}
+
+YOUR RESPONSE HAS TO BE IN JSON FORMAT. DO NOT RETURN ANYTHING ELSE. 
+THESE ANSWERS WILL BE USED TO FILL OUT INFORMATION ON A WEBPAGE. DO NOT INCLUDE ANY UNRELATED INFORMATION OR UNNECESSARY DETAILS IN YOUR ANSWERS.
+
+EXAMPLE RESPONSE FORMAT:
+{
+  "question_1": "answer_1",
+  "question_2": "answer_2",
+  "question_3": "answer_3"
+}
--- a/skyvern/forge/prompts/skyvern/check-user-goal.j2
+++ b/skyvern/forge/prompts/skyvern/check-user-goal.j2
@@ -1,4 +1,4 @@
-Based on the content of the screenshot and the elements on the page, determine whether the user goal has been successfully completed or not.
+Based on the content of the elements on the page, determine whether the user goal has been successfully completed or not.

 The JSON object should be in this format:
 ```json
@@ -7,15 +7,15 @@ The JSON object should be in this format:
  "user_goal_achieved": bool // True if the user goal has been completed, False otherwise.
 }

-Make sure to ONLY return the JSON object, with no additional text before or after it. Do not make any assumptions based on the screenshot, return a response solely based on what you observe in the screenshot and nothing else.
+Make sure to ONLY return the JSON object, with no additional text before or after it. Do not make any assumptions, return a response solely based on the elements on the page.

 Examples:
 {
-  "reasoning": "The screenshot shows a success message for a file upload field. Since the user's goal is to upload a file, it has been successfully completed.",
+  "reasoning": "There is a success message for a file upload field. Since the user's goal is to upload a file, it has been successfully completed.",
  "user_goal_achieved": true
 }
 {
-  "reasoning": "The screenshot shows a job application form with fields. Since the user's goal is to submit a job application, it has not been successfully completed.",
+  "reasoning": "This is a job application form with fields. Since the user's goal is to submit a job application, it has not been successfully completed.",
  "user_goal_achieved": false
 }

--- a/skyvern/forge/prompts/skyvern/extract-action.j2
+++ b/skyvern/forge/prompts/skyvern/extract-action.j2
@@ -14,7 +14,9 @@ Reply in JSON format with the following keys:
    "action_plan": str, // A string that describes the plan of actions you're going to take. Be specific and to the point. Use this as a quick summary of the actions you're going to take, and what order you're going to take them in, and how that moves you towards your overall goal. Output "COMPLETE" action in the "actions" if user_goal_achieved is True.
    "actions": array // An array of actions. Here's the format of each action:
    [{
-        "reasoning": str, // The reasoning behind the action. Be specific, referencing any user information and their fields and element ids in your reasoning. Mention why you chose the action type, and why you chose the element id. Keep the reasoning short and to the point.
+        "reasoning": str, // The reasoning behind the action. This reasoning must be user information agnostic. Mention why you chose the action type, and why you chose the element id. Keep the reasoning short and to the point.
+        "user_detail_query": str, // Think of this value as a Jeopardy question. Ask the user for the details you need for executing this action. Ask the question even if the details are disclosed in user goal or user details. If it's a text field, ask for the text. If it's a file upload, ask for the file. If it's a dropdown, ask for the relevant information. If you are clicking on something specific, ask about what to click on. If you're downloading a file and you have multiple options, ask the user which one to download. Otherwise, use null. Examples are: "What product ID should I input into the search bar?", "What file should I upload?", "What is the previous insurance provider of the user?", "Which invoice should I download?", "Does the user have any pets?". If the action doesn't require any user details, use null.
+        "user_detail_answer": str, // The answer to the `user_detail_query`. The source of this answer can be user goal or user details.
        "confidence_float": float, // The confidence of the action. Pick a number between 0.0 and 1.0. 0.0 means no confidence, 1.0 means full confidence
        "action_type": str, // It's a string enum: "CLICK", "INPUT_TEXT", "UPLOAD_FILE", "SELECT_OPTION", "WAIT", "SOLVE_CAPTCHA", "COMPLETE", "TERMINATE". "CLICK" is an element you'd like to click. "INPUT_TEXT" is an element you'd like to input text into. "UPLOAD_FILE" is an element you'd like to upload a file into. "SELECT_OPTION" is an element you'd like to select an option from. "WAIT" action should be used if there are no actions to take and there is some indication on screen that waiting could yield more actions. "WAIT" should not be used if there are actions to take. "SOLVE_CAPTCHA" should be used if there's a captcha to solve on the screen. "COMPLETE" is used when the user goal has been achieved AND if there's any data extraction goal, you should be able to get data from the page. Never return a COMPLETE action unless the user goal is achieved. "TERMINATE" is used to terminate the whole task with a failure when it doesn't seem like the user goal can be achieved. Do not use "TERMINATE" if waiting could lead the user towards the goal. Only return "TERMINATE" if you are on a page where the user goal cannot be achieved. All other actions are ignored when "TERMINATE" is returned.
        "id": str, // The id of the element to take action on. The id has to be one from the elements list