Traditional approaches to browser automations required writing custom scripts for websites, often relying on DOM parsing and XPath-based interactions which would break whenever the website layouts changed.
Instead of only relying on code-defined XPath interactions, Skyvern relies on prompts in addition to computer vision and LLMs to parse items in the viewport in real-time, create a plan for interaction and interact with them.
This approach gives us a few advantages:
1. Skyvern can operate on websites it’s never seen before, as it’s able to map visual elements to actions necessary to complete a workflow, without any customized code
1. Skyvern is resistant to website layout changes, as there are no pre-determined XPaths or other selectors our system is looking for while trying to navigate
1. Skyvern is able to take a single workflow and apply it to a large number of websites, as it’s able to reason through the interactions necessary to complete the workflow
1. Skyvern leverages LLMs to reason through interactions to ensure we can cover complex situations. Examples include:
1. If you wanted to get an auto insurance quote from Geico, the answer to a common question “Were you eligible to drive at 18?” could be inferred from the driver receiving their license at age 16
1. If you were doing competitor analysis, it’s understanding that an Arnold Palmer 22 oz can at 7/11 is almost definitely the same product as a 23 oz can at Gopuff (even though the sizes are slightly different, which could be a rounding error!)
Instead of only relying on code-defined XPath interactions, Skyvern relies on Vision LLMs to interact with the websites.
Want to see examples of Skyvern in action? Jump to [#real-world-examples-of-skyvern](#real-world-examples-of-skyvern)
# How it works
Skyvern was inspired by the Task-Driven autonomous agent design popularized by [BabyAGI](https://github.com/yoheinakajima/babyagi) and [AutoGPT](https://github.com/Significant-Gravitas/AutoGPT) -- with one major bonus: we give Skyvern the ability to interact with websites using browser automation libraries like [Playwright](https://playwright.dev/).
Skyvern uses a swarm of agents to comprehend a website, and plan and execute its actions:
1.**Interactable Element Agent**: This agent is responsible for parsing the HTML of a website and extracting the interactable elements.
2.**Navigation Agent**: This agent is responsible for planning the navigation to complete a task. Examples include clicking buttons, inserting text, selecting options, etc.
3.**Data Extraction Agent**: This agent is responsible for extracting data from a website. It's capable of reading the tables and text on the page, and extracting the output in a user-defined structured format
4.**Password Agent**: This agent is responsible for filling out password forms on a website. It's capable of reading the username and password from a password manager, and filling out the form while preserving the privacy of the user-defined secrets.
5.**2FA Agent**: This agent is responsible for filling out 2FA forms on a website. It's capable of intercepting website requests for 2FAs, and either requesting user-defined APIs for 2FA codes or waiting for users to feed 2FA codes into it, and then completing the login process.
6.**Dynamic Auto-complete Agent**: This agent is responsible for filling out dynamic auto-complete forms on a website. It's capable of reading the options presented to it, selecting the appropriate option based on the user's input, and adjusting its inputs based on the feedback from inside the form. Popular examples include: Address forms, university dropdowns, and more.
We offer a managed cloud version of Skyvern that allows you to run Skyvern without having to manage the infrastructure. It allows you to run multiple Skyvern instances in parallel and comes bundled with anti-bot detection mechanisms, proxy network, and CAPTCHA solvers.
If you'd like to try it out,
1. Navigate to [app.skyvern.com](https://app.skyvern.com)
1. Create an account & Get $5 of credits on us
1. Kick off your first task and see Skyvern in action!
# Quickstart
This quickstart guide will walk you through getting Skyvern up and running on your local machine.
## Install & Run
> ⚠️ **REQUIREMENT**: This project requires Python 3.11 ⚠️
## Skyvern Cloud
[Skyvern Cloud](https://app.skyvern.com) is a managed cloud version of Skyvern that allows you to run Skyvern without worrying about the infrastructure. It allows you to run multiple Skyvern instances in parallel and comes bundled with anti-bot detection mechanisms, proxy network, and CAPTCHA solvers.
If you'd like to try it out, navigate to [app.skyvern.com](https://app.skyvern.com) and create an account.
## Local Install & Run
> ⚠️ **REQUIREMENT**: This project requires at least Python 3.11 ⚠️
1.**Install Skyvern**
```bash
pip install skyvern
```
2. **Configure Skyvern** Run the setup wizard which will guide you through the configuration process, including Skyvern [MCP](https://github.com/Skyvern-AI/skyvern/blob/main/integrations/mcp/README.md) integration. This will generate a `.env` as the configuration settings file.
```bash
skyvern init
```
```bash
pip install skyvern
```
3. **Launch the Skyvern Server**
2.**Run Skyvern**
```bash
skyvern run server
```
```bash
skyvern quickstart
```
4. **Launch the Skyvern UI**
```bash
skyvern run ui
```
5. **Check component status**
```bash
skyvern status
```
6. **Run task**
3.**Run task**
Run a skyvern task locally:
```python
from skyvern import Skyvern
skyvern = Skyvern()
task = await skyvern.run_task(prompt="Find the top post on hackernews today")
task = await skyvern.local.run_task(prompt="Find the top post on hackernews today")
print(task)
```
A local browser will pop up. Skyvern will start executing the task in the browser and close the it when the task is done. You will be able to review the task from http://localhost:8080/history
@@ -135,6 +85,28 @@ This quickstart guide will walk you through getting Skyvern up and running on yo
print(task)
```
### Helpful commands to debug issues
**Launch the Skyvern Server Separately**
```bash
skyvern run server
```
**Launch the Skyvern UI**
```bash
skyvern run ui
```
**Check status of the Skyvern service**
```bash
skyvern status
```
## Docker Compose setup
1. Make sure you have [Docker Desktop](https://www.docker.com/products/docker-desktop/) installed and running on your machine
@@ -147,45 +119,6 @@ This quickstart guide will walk you through getting Skyvern up and running on yo
```
3. Navigate to `http://localhost:8080` in your browser to start using the UI
## Local vs. Docker
Skyvern supports two startup options for local development:
**Before Option 1**
* If using Option 1, working in a virtual environment may improve the overall experience. Here's how to activate it:
```bash
python3 -m venv skyvern-env/
cd skyvern
source ../skyvern-env/bin/activate
pip install skyvern (Only if not already installed)
pip install -e .
```
Deactivation:
```bash
deactivate
```
**Option 1: Native (Postgres created by the CLI wizard)**
* Use the CLI wizard to spin up a disposable Postgres container, then run the backend natively.
* **Start with:**
```bash
skyvern init
skyvern run server
```
* This reuses the Postgres container created by the wizard.
**Option 2: Docker Compose (Postgres created with Compose)**
* Use Docker Compose to start all services, including a new Postgres container.
* **Start with:**
```bash
docker compose up -d
```
* This launches a new Postgres instance inside Compose.
> **Important:** Only one Postgres container can run on port 5432 at a time. If you switch from the CLI-managed Postgres to Docker Compose, you must first remove the original container:
> ```bash
> docker rm -f postgresql-container
@@ -193,21 +126,32 @@ Skyvern supports two startup options for local development:
If you encounter any database related errors while using Docker to run Skyvern, check which Postgres container is running with `docker ps`.
## Model Context Protocol (MCP)
See the MCP documentation [here](https://github.com/Skyvern-AI/skyvern/blob/main/integrations/mcp/README.md)
## Prompting Tips
# How it works
Skyvern was inspired by the Task-Driven autonomous agent design popularized by [BabyAGI](https://github.com/yoheinakajima/babyagi) and [AutoGPT](https://github.com/Significant-Gravitas/AutoGPT) -- with one major bonus: we give Skyvern the ability to interact with websites using browser automation libraries like [Playwright](https://playwright.dev/).
Here are some tips that may help you on your adventure:
1. Skyvern is really good at carrying out a single goal. If you give it too many instructions to do, it has a high likelihood of hallucinating along the way.
2. Being really explicit about goals is very important. For example, if you're generating an insurance quote, let it know very clearly how it can identify it has accomplished its goals. Use words like "COMPLETE" or "TERMINATE" to indicate success and failure modes, respectively.
3. Workflows can be used if you'd like to do more advanced things such as chaining multiple instructions together, or securely logging in. If you need any help with this, please feel free to book some time with us! We're always happy to help
Skyvern uses a swarm of agents to comprehend a website, and plan and execute its actions:
1. Skyvern can operate on websites it's never seen before, as it's able to map visual elements to actions necessary to complete a workflow, without any customized code
1. Skyvern is resistant to website layout changes, as there are no pre-determined XPaths or other selectors our system is looking for while trying to navigate
1. Skyvern is able to take a single workflow and apply it to a large number of websites, as it's able to reason through the interactions necessary to complete the workflow
1. Skyvern leverages LLMs to reason through interactions to ensure we can cover complex situations. Examples include:
1. If you wanted to get an auto insurance quote from Geico, the answer to a common question "Were you eligible to drive at 18?" could be inferred from the driver receiving their license at age 16
1. If you were doing competitor analysis, it's understanding that an Arnold Palmer 22 oz can at 7/11 is almost definitely the same product as a 23 oz can at Gopuff (even though the sizes are slightly different, which could be a rounding error!)
Skyvern 2.0 is a major overhaul of Skyvern that includes a multi-agent architecture with a planner + validator agent, allowing Skyvern to complete more complex tasks with a zero-shot prompt.
# Skyvern Features
## Skyvern Tasks
Tasks are the fundamental building block inside Skyvern. Each task is a single request to Skyvern, instructing it to navigate through a website and accomplish a specific goal.
@@ -257,26 +201,42 @@ You can also specify a `data_extraction_schema` directly within the main prompt
## File Downloading
Skyvern is also capable of downloading files from a website. All downloaded files are automatically uploaded to block storage (if configured), and you can access them via the UI.
## Authentication (Beta)
## Authentication
Skyvern supports a number of different authentication methods to make it easier to automate tasks behind a login. If you'd like to try it out, please reach out to us [via email](mailto:founders@skyvern.com) or [discord](https://discord.gg/fG2XXEuQX3).
- [PostgreSQL 14](https://www.postgresql.org/download/) (if you're on a Mac, setup script will install it for you if you have homebrew installed)
- `brew install postgresql`
## Setup (Contributors)
1. Clone the repository and navigate to the root directory
1. Open Docker Desktop (Works for Windows, macOS, and Linux) or run Docker Daemon
1. Run the setup script to install the necessary dependencies and setup your environment
```bash
./setup.sh
```
1. Start the server
```bash
./run_skyvern.sh
```
1. You can start sending requests to the server, but we built a simple UI to help you get started. To start the UI, run the following command:
```bash
./run_ui.sh
```
1. Navigate to `http://localhost:8080` in your browser to start using the UI
*The Skyvern CLI supports Windows, WSL, macOS, and Linux environments.*
## Additional Setup for Contributors
If you're looking to contribute to Skyvern, you'll need to install the pre-commit hooks to ensure code quality and consistency. You can do this by running the following command:
```bash
pre-commit install
```
# Documentation
More extensive documentation can be found on our [docs page](https://docs.skyvern.com). Please let us know if something is unclear or missing by opening an issue or reaching out to us [via email](mailto:founders@skyvern.com) or [discord](https://discord.gg/fG2XXEuQX3).
More extensive documentation can be found on our [📕 docs page](https://docs.skyvern.com). Please let us know if something is unclear or missing by opening an issue or reaching out to us [via email](mailto:founders@skyvern.com) or [discord](https://discord.gg/fG2XXEuQX3).
# Supported LLMs
| Provider | Supported Models |
@@ -391,47 +317,99 @@ More extensive documentation can be found on our [docs page](https://docs.skyver
| OpenAI-compatible | Any custom API endpoint that follows OpenAI's API format (via [liteLLM](https://docs.litellm.ai/docs/providers/openai_compatible)) |
| `ENABLE_BEDROCK` | Register AWS Bedrock models. To use AWS Bedrock, you need to make sure your [AWS configurations](https://github.com/boto/boto3?tab=readme-ov-file#using-boto3) are set up correctly first. | Boolean | `true`, `false` |
| `ENABLE_OPENAI_COMPATIBLE`| Register a custom OpenAI-compatible API endpoint | Boolean | `true`, `false` |
| `LLM_KEY` | The name of the model you want to use | String | Currently supported llm keys: `OPENAI_GPT4_TURBO`, `OPENAI_GPT4V`, `OPENAI_GPT4O`, `OPENAI_GPT4O_MINI`, `ANTHROPIC_CLAUDE3`, `ANTHROPIC_CLAUDE3_OPUS`, `ANTHROPIC_CLAUDE3_SONNET`, `ANTHROPIC_CLAUDE3_HAIKU`, `ANTHROPIC_CLAUDE3.5_SONNET`, `BEDROCK_ANTHROPIC_CLAUDE3_OPUS`, `BEDROCK_ANTHROPIC_CLAUDE3_SONNET`, `BEDROCK_ANTHROPIC_CLAUDE3_HAIKU`, `BEDROCK_ANTHROPIC_CLAUDE3.5_SONNET`, `AZURE_OPENAI`, `GEMINI_PRO`, `GEMINI_FLASH`, `BEDROCK_AMAZON_NOVA_PRO`, `BEDROCK_AMAZON_NOVA_LITE`, `OLLAMA`, `OPENROUTER`, `OPENAI_COMPATIBLE`|
| `SECONDARY_LLM_KEY` | The name of the model for mini agents skyvern runs with | String | Currently supported llm keys: `OPENAI_GPT4_TURBO`, `OPENAI_GPT4V`, `OPENAI_GPT4O`, `OPENAI_GPT4O_MINI`, `ANTHROPIC_CLAUDE3`, `ANTHROPIC_CLAUDE3_OPUS`, `ANTHROPIC_CLAUDE3_SONNET`, `ANTHROPIC_CLAUDE3_HAIKU`, `ANTHROPIC_CLAUDE3.5_SONNET`, `BEDROCK_ANTHROPIC_CLAUDE3_OPUS`, `BEDROCK_ANTHROPIC_CLAUDE3_SONNET`, `BEDROCK_ANTHROPIC_CLAUDE3_HAIKU`, `BEDROCK_ANTHROPIC_CLAUDE3.5_SONNET`, `AZURE_OPENAI`, `GEMINI_PRO`, `GEMINI_FLASH`, `NOVITA_DEEPSEEK_R1`, `NOVITA_DEEPSEEK_V3`, `NOVITA_LLAMA_3_3_70B`, `NOVITA_LLAMA_3_2_1B`, `NOVITA_LLAMA_3_2_3B`, `NOVITA_LLAMA_3_2_11B_VISION`, `NOVITA_LLAMA_3_1_8B`, `NOVITA_LLAMA_3_1_70B`, `NOVITA_LLAMA_3_1_405B`, `NOVITA_LLAMA_3_8B`, `NOVITA_LLAMA_3_70B`, `OLLAMA`, `OPENROUTER`, `OPENAI_COMPATIBLE`|
| `AZURE_API_BASE` | Azure deployment api base url| String | `https://skyvern-deployment.openai.azure.com/`|
| `AZURE_API_VERSION` | Azure API Version| String | `2024-02-01`|
Supported LLM Key: `AZURE_OPENAI`
##### AWS Bedrock
| Variable | Description| Type | Sample Value|
| -------- | ------- | ------- | ------- |
| `ENABLE_BEDROCK` | Register AWS Bedrock models. To use AWS Bedrock, you need to make sure your [AWS configurations](https://github.com/boto/boto3?tab=readme-ov-file#using-boto3) are set up correctly first. | Boolean | `true`, `false` |
| `LLM_KEY` | The name of the model you want to use | String | See supported LLM keys above |
| `SECONDARY_LLM_KEY` | The name of the model for mini agents skyvern runs with | String | See supported LLM keys above |
| `LLM_CONFIG_MAX_TOKENS` | Override the max tokens used by the LLM | Integer | `128000` |
# Feature Roadmap
This is our planned roadmap for the next few months. If you have any suggestions or would like to see a feature added, please don't hesitate to reach out to us [via email](mailto:founders@skyvern.com) or [discord](https://discord.gg/fG2XXEuQX3).
This is a wrapper around Skyvern.run_task that ensures it's used in local mode.
Args:
prompt: The prompt describing the task to run
engine: The engine to use for running the task
url: Optional URL to navigate to
webhook_url: Optional webhook URL for callbacks
totp_identifier: Optional TOTP identifier
totp_url: Optional TOTP verification URL
title: Optional title for the task
error_code_mapping: Optional mapping of error codes to messages
data_extraction_schema: Optional schema for data extraction
proxy_location: Optional proxy location
max_steps: Optional maximum number of steps
wait_for_completion: Whether to wait for task completion
timeout: Timeout in seconds
browser_session_id: Optional browser session ID
user_agent: Optional user agent string
Returns:
TaskRunResponse: The response from running the task
Raises:
ValueError: If an API key is provided (this function is for local mode only)
"""
skyvern=Skyvern()# Initialize in local mode (no API key)
returnawaitskyvern.run_task(
prompt=prompt,
engine=engine,
url=url,
webhook_url=webhook_url,
totp_identifier=totp_identifier,
totp_url=totp_url,
title=title,
error_code_mapping=error_code_mapping,
data_extraction_schema=data_extraction_schema,
proxy_location=proxy_location,
max_steps=max_steps,
wait_for_completion=wait_for_completion,
timeout=timeout,
browser_session_id=browser_session_id,
user_agent=user_agent,
)
def__init__(
self,
*,
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.