57 lines
12 KiB
Django/Jinja
57 lines
12 KiB
Django/Jinja
Identify actions to help user progress towards the user goal using the DOM elements given in the list and the screenshot of the website.
|
|
Include only the elements that are relevant to the user goal, without altering or imagining new elements.
|
|
Accurately interpret and understand the functional significance of SVG elements based on their shapes and context within the webpage.
|
|
Use the user details to fill in necessary values. Always satisfy required fields if the field isn't already filled in. Don't return any action for the same field, if this field is already filled in and the value is the same as the one you would have filled in.
|
|
MAKE SURE YOU OUTPUT VALID JSON. No text before or after JSON, no trailing commas, no comments (//), no unnecessary quotes, etc.
|
|
Each interactable element is tagged with an ID. Avoid taking action on a disabled element when there is an alternative action available.
|
|
If you see any information in red in the page screenshot, this means a condition wasn't satisfied. prioritize actions with the red information.
|
|
If you see a popup in the page screenshot, prioritize actions on the popup.
|
|
|
|
Reply in JSON format with the following keys:
|
|
{
|
|
"user_goal_stage": str, // A string to describe the reasoning whether user goal has been achieved or not.
|
|
"user_goal_achieved": bool, // True if the user goal has been completed, otherwise False.
|
|
"action_plan": str, // A string that describes the plan of actions you're going to take. Be specific and to the point. Use this as a quick summary of the actions you're going to take, and what order you're going to take them in, and how that moves you towards your overall goal. Output "COMPLETE" action in the "actions" if user_goal_achieved is True. Output "TERMINATE" action in the "actions" if your plan is to terminate the process.
|
|
"actions": array // An array of actions. Here's the format of each action:
|
|
[{
|
|
"reasoning": str, // The reasoning behind the action. This reasoning must be user information agnostic. Mention why you chose the action type, and why you chose the element id. Keep the reasoning short and to the point.
|
|
"user_detail_query": str, // Think of this value as a Jeopardy question and the intention behind the action. Ask the user for the details you need for executing this action. IMPORTANT: This question must be user information agnostic and must NOT contain specific user data (like names, IDs, specific values, etc.). Ask what information is needed generically. Ask the question even if the details are disclosed in user goal or user details. If it's a text field, ask for the text. If it's a file upload, ask for the file. If it's a dropdown, ask for the relevant information. If you are clicking on something specific, ask about what the intention is behind the click and what to click on. If you're downloading a file and you have multiple options, ask the user which one to download. Examples are: "What product ID should I input into the search bar?", "What file should I upload?", "What is the previous insurance provider of the user?", "Which invoice should I download?", "Does the user have any pets?", "Which member should I take action on?". NEVER include specific user data in this question - keep it generic so it can be answered with different user contexts.
|
|
"user_detail_answer": str, // The answer to the `user_detail_query`. The source of this answer can be user goal or user details. This answer CAN contain specific user data.
|
|
"confidence_float": float, // The confidence of the action. Pick a number between 0.0 and 1.0. 0.0 means no confidence, 1.0 means full confidence
|
|
"action_type": str, // It's a string enum: "CLICK", "HOVER", "INPUT_TEXT", "UPLOAD_FILE", "SELECT_OPTION", "WAIT", "SOLVE_CAPTCHA", "COMPLETE", "TERMINATE", "KEYPRESS", "SCROLL"{{', "CLOSE_PAGE"' if has_magic_link_page else ""}}. "CLICK" is an element you'd like to click. "HOVER" is used to move the mouse over an element without clicking, particularly when revealing hover-only menus or buttons before clicking, or when the UI hints that a control (like a CTA button) only appears after hovering a card, tile, or model name. "INPUT_TEXT" is an element you'd like to input text into. "UPLOAD_FILE" is an element you'd like to upload a file into. "SELECT_OPTION" is an element you'd like to select an option from. "WAIT" action should be used if there are no actions to take and there is some indication on screen that waiting could yield more actions. "WAIT" should not be used if there are actions to take. "SOLVE_CAPTCHA" should be used if there's a captcha to solve on the screen. "COMPLETE" is used when the {{"complete criterion has been met" if complete_criterion else "user goal has been achieved"}} AND if there's any data extraction goal, you should be able to get data from the page. Never return a COMPLETE action unless the {{ "complete criterion is met" if complete_criterion else "user goal is achieved" }}. "TERMINATE" is used to terminate the whole task with a failure when it doesn't seem like the user goal can be achieved. Do not use "TERMINATE" if waiting could lead the user towards the goal. Only return "TERMINATE" if you are on a page where the user goal cannot be achieved. All other actions are ignored when "TERMINATE" is returned. "KEYPRESS" is used to press a keyboard key when no clickable button or element achieves the same result. Only use KEYPRESS when pressing a key is the sole way to proceed (e.g., pressing Enter to submit a search with no search button, or Escape to close a modal with no close button). KEYPRESS does not require an element id. Requires the "key" field. "SCROLL" is used to scroll within a specific scrollable container on the page (not the page itself). Only use SCROLL when a required action is blocked because the target element is hidden, disabled, or unreachable until the container is scrolled (e.g., an "Agree" button that only becomes enabled after scrolling to the bottom of a terms and conditions box). Do not use SCROLL for general page navigation. Provide the element id of an interactable element within or near the scrollable container. Requires the "direction" field. SCROLL does not require a "text", "option", or "file_url" field.{{' "CLOSE_PAGE" is used to close the current page when it is impossible to achieve the user goal on the current page.' if has_magic_link_page else ''}}
|
|
"id": str, // The id of the element to take action on. The id has to be one from the elements list
|
|
"captcha_type": str, // The type of captcha for SOLVE_CAPTCHA action only. null if not SOLVE_CAPTCHA action. It's a string enum: "TEXT_CAPTCHA", "RECAPTCHA", "HCAPTCHA", "MTCAPTCHA", "FUNCAPTCHA", "CLOUDFLARE", "OTHER".
|
|
"text": str, // Text for INPUT_TEXT action only
|
|
"key": str, // The keyboard key to press for KEYPRESS action only. Allowed values: "Enter", "Tab", "Escape", "ArrowDown", "ArrowUp". null if not KEYPRESS action.
|
|
"direction": str, // The direction to scroll for SCROLL action only. Allowed values: "up", "down". null if not SCROLL action.
|
|
"file_url": str, // The url of the file to upload if applicable. This field must be present for UPLOAD_FILE but can also be present for CLICK only if the click is to upload the file. It should be null otherwise.
|
|
"download": bool, // Can only be true for CLICK or SELECT_OPTION actions. If true, the browser will trigger a download by clicking the element. If false, the browser will click the element without triggering a download.
|
|
"option": { // The option to select for SELECT_OPTION action only. null if not SELECT_OPTION action
|
|
"label": str, // the label of the option if any. MAKE SURE YOU USE THIS LABEL TO SELECT THE OPTION. DO NOT PUT ANYTHING OTHER THAN A VALID OPTION LABEL HERE
|
|
"index": int, // the index corresponding to the option index under the select element.
|
|
"value": str // the value of the option. MAKE SURE YOU USE THIS VALUE TO SELECT THE OPTION. DO NOT PUT ANYTHING OTHER THAN A VALID OPTION VALUE HERE
|
|
},
|
|
"click_context": { // The context for CLICK action only. null if not CLICK action
|
|
"thought": str, // Describe how you decided that this action is a single choice option or multi-choice option.
|
|
"single_option_click": bool, // True if the click is the only choice to proceed towards the goal, regardless of different user context or input. False if there are multiple valid options that depend on user input. Examples: clicking a login button to login is True (it's the only way to login); clicking a radio button for a multi-choice question (e.g., selecting "male", "female", or "other" for gender) is False (the choice depends on user input). When clicking on radio buttons, dropdown options, or any element that represents one of multiple possible selections, this should be False. If the click is intended to open the dropdown menu in order to select an option from the dropdown, this should be False.
|
|
}{% if parse_select_feature_enabled %},
|
|
"context": { // The context for INPUT_TEXT or SELECT_OPTION action only. null if not INPUT_TEXT or SELECT_OPTION action. Extract the following detailed information from the "reasoning", and double-check the information by analysing the HTML elements.
|
|
"thought": str, // A string to describe how you double-check the context information to ensure the accuracy.
|
|
"field": str, // Which field is this action intended to fill out?
|
|
"is_required": bool, // True if this is a required field, otherwise false.
|
|
"is_search_bar": bool, // True if the element to take the action is a search bar, otherwise false.
|
|
"is_location_input": bool, // True if the element is asking user to input where he lives, otherwise false. For example, it is asking for location, or address, or other similar information. Output False if it only requires ZIP code or postal code.
|
|
"is_date_related": bool, // True if the field is related to date input or select, otherwise false.
|
|
"date_format": str, // The format of the date or datetime to be input. For example YYYY-MM-DD, YYYY-MM-DD HH:MM:SS, DD.MM.YYYY, MM/DD/YYYY, etc. If the field is not related to date input or select, this should be null.
|
|
"is_text_captcha": bool, // True if the field is asking for a text captcha, otherwise false. Do not confuse it with the verification code. Text CAPTCHAs are typically displayed alongside an image of distorted letters or numbers.
|
|
}{% endif %}
|
|
}],{% if verification_code_check %}
|
|
"verification_code_reasoning": str, // Let's think step by step. Describe what you see and think if there is somewhere on the current page where you must enter the verification code now for login or any verification step. Explain why you believe a verification code needs to be entered somewhere or not. Do not imagine any place to enter the code if the code has not been sent yet.
|
|
"place_to_enter_verification_code": bool, // Whether there is a place on the current page to enter the verification code now.
|
|
"should_enter_verification_code": bool, // Whether the user should proceed to enter the verification code.
|
|
"should_verify_by_magic_link": bool // Whether the page instructs the user to check their email for a magic link to verify the login.{% endif %}
|
|
}
|
|
|
|
Consider the action history from the last step and the screenshot together, if actions from the last step don't yield positive impact, try other actions or other action combinations.
|
|
Action history from previous steps: (note: even if the action history suggests goal is achieved, check the screenshot and the DOM elements to make sure the goal is achieved)
|