Add Claude 4.5 Opus support and improve SDK documentation (#4633)

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Suchintan
2026-02-08 00:46:19 -05:00
committed by GitHub
parent 75af862841
commit 8162498952
4 changed files with 304 additions and 151 deletions

288
README.md
View File

@@ -23,7 +23,7 @@
<a href="https://www.linkedin.com/company/95726232"><img src="https://img.shields.io/badge/Follow%20 on%20LinkedIn-8A2BE2?logo=linkedin"/></a>
</p>
[Skyvern](https://www.skyvern.com) automates browser-based workflows using LLMs and computer vision. It provides a simple API endpoint to fully automate manual workflows on a large number of websites, replacing brittle or unreliable automation solutions.
[Skyvern](https://www.skyvern.com) automates browser-based workflows using LLMs and computer vision. It provides a Playwright-compatible SDK that adds AI functionality on top of playwright, as well as a no-code workflow builder to help both technical and non-technical users automate manual workflows on any website, replacing brittle or unreliable automation solutions.
<p align="center">
<img src="fern/images/geico_shu_recording_cropped.gif"/>
@@ -48,32 +48,12 @@ This approach has a few advantages:
1. Skyvern can operate on websites it's never seen before, as it's able to map visual elements to actions necessary to complete a workflow, without any customized code
1. Skyvern is resistant to website layout changes, as there are no pre-determined XPaths or other selectors our system is looking for while trying to navigate
1. Skyvern is able to take a single workflow and apply it to a large number of websites, as it's able to reason through the interactions necessary to complete the workflow
1. Skyvern leverages LLMs to reason through interactions to ensure we can cover complex situations. Examples include:
1. If you wanted to get an auto insurance quote from Geico, the answer to a common question "Were you eligible to drive at 18?" could be inferred from the driver receiving their license at age 16
1. If you were doing competitor analysis, it's understanding that an Arnold Palmer 22 oz can at 7/11 is almost definitely the same product as a 23 oz can at Gopuff (even though the sizes are slightly different, which could be a rounding error!)
A detailed technical report can be found [here](https://www.skyvern.com/blog/skyvern-2-0-state-of-the-art-web-navigation-with-85-8-on-webvoyager-eval/).
# Demo
<!-- Redo demo -->
https://github.com/user-attachments/assets/5cab4668-e8e2-4982-8551-aab05ff73a7f
# Performance & Evaluation
Skyvern has SOTA performance on the [WebBench benchmark](webbench.ai) with a 64.4% accuracy. The technical report + evaluation can be found [here](https://www.skyvern.com/blog/web-bench-a-new-way-to-compare-ai-browser-agents/)
<p align="center">
<img src="fern/images/performance/webbench_overall.png"/>
</p>
## Performance on WRITE tasks (eg filling out forms, logging in, downloading files, etc)
Skyvern is the best performing agent on WRITE tasks (eg filling out forms, logging in, downloading files, etc), which is primarily used for RPA (Robotic Process Automation) adjacent tasks.
<p align="center">
<img src="fern/images/performance/webbench_write.png"/>
</p>
# Quickstart
## Skyvern Cloud
@@ -81,7 +61,11 @@ Skyvern is the best performing agent on WRITE tasks (eg filling out forms, loggi
If you'd like to try it out, navigate to [app.skyvern.com](https://app.skyvern.com) and create an account.
## Install & Run
## Run Locally (UI + Server)
Choose your preferred setup method:
### Option A: pip install (Recommended)
Dependencies needed:
- [Python 3.11.x](https://www.python.org/downloads/), works with 3.12, not ready yet for 3.13
@@ -91,14 +75,13 @@ Additionally, for Windows:
- [Rust](https://rustup.rs/)
- VS Code with C++ dev tools and Windows SDK
### 1. Install Skyvern
#### 1. Install Skyvern
```bash
pip install skyvern
```
### 2. Run Skyvern
This is most helpful for first time run (db setup, db migrations etc).
#### 2. Run Skyvern
```bash
skyvern quickstart
@@ -111,20 +94,140 @@ local Docker PostgreSQL setup:
skyvern quickstart --database-string "postgresql+psycopg://user:password@localhost:5432/skyvern"
```
### 3. Run task
### Option B: Docker Compose
#### UI (Recommended)
1. Install [Docker Desktop](https://www.docker.com/products/docker-desktop/)
2. Clone the repository:
```bash
git clone https://github.com/skyvern-ai/skyvern.git && cd skyvern
```
3. Run quickstart with Docker Compose:
```bash
pip install skyvern && skyvern quickstart
```
When prompted, choose "Docker Compose" for the full containerized setup.
4. Navigate to http://localhost:8080
Start the Skyvern service and UI (when DB is up and running)
## SDK
**Skyvern is a Playwright extension that adds AI-powered browser automation.** It gives you the full power of Playwright with additional AI capabilities—use natural language prompts to interact with elements, extract data, and automate complex multi-step workflows.
**Installation:**
- Python: `pip install skyvern` then run `skyvern quickstart` for local setup
- TypeScript: `npm install @skyvern/client`
### AI-Powered Page Commands
Skyvern adds four core AI commands directly on the page object:
| Command | Description |
|---------|-------------|
| `page.act(prompt)` | Perform actions using natural language (e.g., "Click the login button") |
| `page.extract(prompt, schema)` | Extract structured data from the page with optional JSON schema |
| `page.validate(prompt)` | Validate page state, returns `bool` (e.g., "Check if user is logged in") |
| `page.prompt(prompt, schema)` | Send arbitrary prompts to the LLM with optional response schema |
Additionally, `page.agent` provides higher-level workflow commands:
| Command | Description |
|---------|-------------|
| `page.agent.run_task(prompt)` | Execute complex multi-step tasks |
| `page.agent.login(credential_type, credential_id)` | Authenticate with stored credentials (Skyvern, Bitwarden, 1Password) |
| `page.agent.download_files(prompt)` | Navigate and download files |
| `page.agent.run_workflow(workflow_id)` | Execute pre-built workflows |
### AI-Augmented Playwright Actions
All standard Playwright actions support an optional `prompt` parameter for AI-powered element location:
| Action | Playwright | AI-Augmented |
|--------|------------|--------------|
| Click | `page.click("#btn")` | `page.click(prompt="Click login button")` |
| Fill | `page.fill("#email", "a@b.com")` | `page.fill(prompt="Email field", value="a@b.com")` |
| Select | `page.select_option("#country", "US")` | `page.select_option(prompt="Country dropdown", value="US")` |
| Upload | `page.upload_file("#file", "doc.pdf")` | `page.upload_file(prompt="Upload area", files="doc.pdf")` |
**Three interaction modes:**
```python
# 1. Traditional Playwright - CSS/XPath selectors
await page.click("#submit-button")
# 2. AI-powered - natural language
await page.click(prompt="Click the green Submit button")
# 3. AI fallback - tries selector first, falls back to AI if it fails
await page.click("#submit-btn", prompt="Click the Submit button")
```
### Core AI Commands - Examples
```python
# act - Perform actions using natural language
await page.act("Click the login button and wait for the dashboard to load")
# extract - Extract structured data with optional JSON schema
result = await page.extract("Get the product name and price")
result = await page.extract(
prompt="Extract order details",
schema={"order_id": "string", "total": "number", "items": "array"}
)
# validate - Check page state (returns bool)
is_logged_in = await page.validate("Check if the user is logged in")
# prompt - Send arbitrary prompts to the LLM
summary = await page.prompt("Summarize what's on this page")
```
### Quick Start Examples
**Run via UI:**
```bash
skyvern run all
```
Navigate to http://localhost:8080 to run tasks through the web interface.
Go to http://localhost:8080 and use the UI to run a task
**Python SDK:**
```python
from skyvern import Skyvern
#### Code
# Local mode
skyvern = Skyvern.local()
# Or connect to Skyvern Cloud
skyvern = Skyvern(api_key="your-api-key")
# Launch browser and get page
browser = await skyvern.launch_cloud_browser()
page = await browser.get_working_page()
# Mix Playwright with AI-powered actions
await page.goto("https://example.com")
await page.click("#login-button") # Traditional Playwright
await page.agent.login(credential_type="skyvern", credential_id="cred_123") # AI login
await page.click(prompt="Add first item to cart") # AI-augmented click
await page.agent.run_task("Complete checkout with: John Snow, 12345") # AI task
```
**TypeScript SDK:**
```typescript
import { Skyvern } from "@skyvern/client";
const skyvern = new Skyvern({ apiKey: "your-api-key" });
const browser = await skyvern.launchCloudBrowser();
const page = await browser.getWorkingPage();
// Mix Playwright with AI-powered actions
await page.goto("https://example.com");
await page.click("#login-button"); // Traditional Playwright
await page.agent.login("skyvern", { credentialId: "cred_123" }); // AI login
await page.click({ prompt: "Add first item to cart" }); // AI-augmented click
await page.agent.runTask("Complete checkout with: John Snow, 12345"); // AI task
await browser.close();
```
**Simple task execution:**
```python
from skyvern import Skyvern
@@ -132,88 +235,6 @@ skyvern = Skyvern()
task = await skyvern.run_task(prompt="Find the top post on hackernews today")
print(task)
```
Skyvern starts running the task in a browser that pops up and closes it when the task is done. You will be able to view the task from http://localhost:8080/history
You can also run a task on different targets:
```python
from skyvern import Skyvern
# Run on Skyvern Cloud
skyvern = Skyvern(api_key="SKYVERN API KEY")
# Local Skyvern service
skyvern = Skyvern(base_url="http://localhost:8000", api_key="LOCAL SKYVERN API KEY")
task = await skyvern.run_task(prompt="Find the top post on hackernews today")
print(task)
```
## SDK
**Installation:**
- Python: `pip install skyvern` then run `skyvern quickstart` for local setup
- TypeScript: `npm install @skyvern/client`
Skyvern provides SDKs for both Python and TypeScript to integrate browser automation into your applications.
### Python SDK
```python
from skyvern import Skyvern
# Connect to Skyvern Cloud
skyvern = Skyvern(api_key="your-api-key")
# Or run locally
skyvern = Skyvern.local()
# Launch a cloud browser
browser = await skyvern.launch_cloud_browser()
page = await browser.get_working_page()
# Use AI-powered actions for complex workflows
await page.agent.run_task("Navigate to the most recent invoice and download it")
# Or mix with Playwright actions
await page.goto("https://example.com")
await page.click("#button")
```
### TypeScript SDK
```typescript
import { Skyvern } from "@skyvern/client";
// Connect to Skyvern Cloud
const skyvern = new Skyvern({ apiKey: "your-api-key" });
// Launch a cloud browser
const browser = await skyvern.launchCloudBrowser();
const page = await browser.getWorkingPage();
// Use AI-powered actions for complex workflows
await page.agent.runTask("Navigate to the most recent invoice and download it");
// Or mix with Playwright actions
await page.goto("https://example.com");
await page.click("#button");
```
Skyvern enhances Playwright methods with AI capabilities. Use regular Playwright syntax with a `prompt` parameter to make any action AI-powered:
```python
# Traditional Playwright - uses selectors
await page.click("#submit-button")
# AI-augmented Playwright - uses natural language
await page.click(prompt="Click on the Submit button")
await page.fill(prompt="Enter email address", value="user@example.com")
# Mix both approaches in the same workflow
await page.goto("https://example.com/dashboard")
await page.click(prompt="Click on the most recent unpaid invoice")
await page.click("#download-button")
```
## Advanced Usage
@@ -311,28 +332,21 @@ skyvern stop ui
skyvern stop server
```
## Docker Compose setup
# Performance & Evaluation
1. Make sure you have [Docker Desktop](https://www.docker.com/products/docker-desktop/) installed and running on your machine
1. Make sure you don't have postgres running locally (Run `docker ps` to check)
1. Clone the repository and navigate to the root directory
1. Run `skyvern init llm` to generate a `.env` file. This will be copied into the Docker image.
1. Fill in the LLM provider key on the [docker-compose.yml](./docker-compose.yml). *If you want to run Skyvern on a remote server, make sure you set the correct server ip for the UI container in [docker-compose.yml](./docker-compose.yml).*
2. Run the following command via the commandline:
```bash
docker compose up -d
```
3. Navigate to `http://localhost:8080` in your browser to start using the UI
Skyvern has SOTA performance on the [WebBench benchmark](webbench.ai) with a 64.4% accuracy. The technical report + evaluation can be found [here](https://www.skyvern.com/blog/web-bench-a-new-way-to-compare-ai-browser-agents/)
> [!Important]
> Only one Postgres container can run on port 5432 at a time. If you switch from the CLI-managed Postgres to Docker Compose, you must first remove the original container:
> ```bash
> docker rm -f postgresql-container
> ```
<p align="center">
<img src="fern/images/performance/webbench_overall.png"/>
</p>
If you encounter any database related errors while using Docker to run Skyvern, check which Postgres container is running with `docker ps`.
## Performance on WRITE tasks (eg filling out forms, logging in, downloading files, etc)
Skyvern is the best performing agent on WRITE tasks (eg filling out forms, logging in, downloading files, etc), which is primarily used for RPA (Robotic Process Automation) adjacent tasks.
<p align="center">
<img src="fern/images/performance/webbench_write.png"/>
</p>
# Skyvern Features
@@ -494,11 +508,11 @@ More extensive documentation can be found on our [📕 docs page](https://www.sk
# Supported LLMs
| Provider | Supported Models |
| -------- | ------- |
| OpenAI | gpt4-turbo, gpt-4o, gpt-4o-mini |
| Anthropic | Claude 3 (Haiku, Sonnet, Opus), Claude 3.5 (Sonnet) |
| OpenAI | GPT-5, GPT-5.2, GPT-4.1, o3, o4-mini |
| Anthropic | Claude 4 (Sonnet, Opus), Claude 4.5 (Haiku, Sonnet, Opus) |
| Azure OpenAI | Any GPT models. Better performance with a multimodal llm (azure/gpt4-o) |
| AWS Bedrock | Anthropic Claude 3 (Haiku, Sonnet, Opus), Claude 3.5 (Sonnet) |
| Gemini | Gemini 2.5 Pro and flash, Gemini 2.0 |
| AWS Bedrock | Claude 3.5, Claude 3.7, Claude 4 (Sonnet, Opus), Claude 4.5 (Sonnet, Opus) |
| Gemini | Gemini 3 Pro/Flash, Gemini 2.5 Pro/Flash |
| Ollama | Run any locally hosted model via [Ollama](https://github.com/ollama/ollama) |
| OpenRouter | Access models through [OpenRouter](https://openrouter.ai) |
| OpenAI-compatible | Any custom API endpoint that follows OpenAI's API format (via [liteLLM](https://docs.litellm.ai/docs/providers/openai_compatible)) |
@@ -513,7 +527,7 @@ More extensive documentation can be found on our [📕 docs page](https://www.sk
| `OPENAI_API_BASE` | OpenAI API Base, optional | String | `https://openai.api.base` |
| `OPENAI_ORGANIZATION` | OpenAI Organization ID, optional | String | `your-org-id` |
Recommended `LLM_KEY`: `OPENAI_GPT4O`, `OPENAI_GPT4O_MINI`, `OPENAI_GPT4_1`, `OPENAI_O4_MINI`, `OPENAI_O3`
Recommended `LLM_KEY`: `OPENAI_GPT5`, `OPENAI_GPT5_2`, `OPENAI_GPT4_1`, `OPENAI_O3`, `OPENAI_O4_MINI`
##### Anthropic
| Variable | Description| Type | Sample Value|
@@ -521,7 +535,7 @@ Recommended `LLM_KEY`: `OPENAI_GPT4O`, `OPENAI_GPT4O_MINI`, `OPENAI_GPT4_1`, `OP
| `ENABLE_ANTHROPIC` | Register Anthropic models| Boolean | `true`, `false` |
| `ANTHROPIC_API_KEY` | Anthropic API key| String | `sk-1234567890` |
Recommended`LLM_KEY`: `ANTHROPIC_CLAUDE3.5_SONNET`, `ANTHROPIC_CLAUDE3.7_SONNET`, `ANTHROPIC_CLAUDE4_OPUS`, `ANTHROPIC_CLAUDE4_SONNET`
Recommended `LLM_KEY`: `ANTHROPIC_CLAUDE4.5_OPUS`, `ANTHROPIC_CLAUDE4.5_SONNET`, `ANTHROPIC_CLAUDE4_OPUS`, `ANTHROPIC_CLAUDE4_SONNET`
##### Azure OpenAI
| Variable | Description| Type | Sample Value|
@@ -539,7 +553,7 @@ Recommended `LLM_KEY`: `AZURE_OPENAI`
| -------- | ------- | ------- | ------- |
| `ENABLE_BEDROCK` | Register AWS Bedrock models. To use AWS Bedrock, you need to make sure your [AWS configurations](https://github.com/boto/boto3?tab=readme-ov-file#using-boto3) are set up correctly first. | Boolean | `true`, `false` |
Recommended `LLM_KEY`: `BEDROCK_ANTHROPIC_CLAUDE3.7_SONNET_INFERENCE_PROFILE`, `BEDROCK_ANTHROPIC_CLAUDE4_OPUS_INFERENCE_PROFILE`, `BEDROCK_ANTHROPIC_CLAUDE4_SONNET_INFERENCE_PROFILE`
Recommended `LLM_KEY`: `BEDROCK_ANTHROPIC_CLAUDE4.5_OPUS_INFERENCE_PROFILE`, `BEDROCK_ANTHROPIC_CLAUDE4.5_SONNET_INFERENCE_PROFILE`, `BEDROCK_ANTHROPIC_CLAUDE4_OPUS_INFERENCE_PROFILE`
##### Gemini
| Variable | Description| Type | Sample Value|
@@ -547,7 +561,7 @@ Recommended `LLM_KEY`: `BEDROCK_ANTHROPIC_CLAUDE3.7_SONNET_INFERENCE_PROFILE`, `
| `ENABLE_GEMINI` | Register Gemini models| Boolean | `true`, `false` |
| `GEMINI_API_KEY` | Gemini API Key| String | `your_google_gemini_api_key`|
Recommended `LLM_KEY`: `GEMINI_2.5_PRO_PREVIEW`, `GEMINI_2.5_FLASH_PREVIEW`
Recommended `LLM_KEY`: `GEMINI_2.5_PRO`, `GEMINI_2.5_FLASH`, `GEMINI_2.5_PRO_PREVIEW`, `GEMINI_2.5_FLASH_PREVIEW`
##### Ollama
| Variable | Description| Type | Sample Value|