feat: new self-hosting docs (#4689)
Co-authored-by: Ritik Sahni <ritiksahni0203@gmail.com>
This commit is contained in:
@@ -52,12 +52,12 @@ services:
|
|||||||
# - ENABLE_OPENAI=true
|
# - ENABLE_OPENAI=true
|
||||||
# - LLM_KEY=OPENAI_GPT4O
|
# - LLM_KEY=OPENAI_GPT4O
|
||||||
# - OPENAI_API_KEY=<your_openai_key>
|
# - OPENAI_API_KEY=<your_openai_key>
|
||||||
# Gemini Support:
|
# Gemini Support (via Vertex AI):
|
||||||
# Gemini is a new LLM provider that is currently in beta. You can use it by uncommenting the following lines and filling in your Gemini API key.
|
# - ENABLE_VERTEX_AI=true
|
||||||
# - LLM_KEY=GEMINI
|
# - LLM_KEY=VERTEX_GEMINI_3.0_FLASH
|
||||||
# - ENABLE_GEMINI=true
|
# - GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json
|
||||||
# - GEMINI_API_KEY=YOUR_GEMINI_KEY
|
# - GCP_PROJECT_ID=your-gcp-project-id
|
||||||
# - LLM_KEY=GEMINI_2.5_PRO_PREVIEW_03_25
|
# - GCP_REGION=us-central1
|
||||||
# If you want to use other LLM provider, like azure and anthropic:
|
# If you want to use other LLM provider, like azure and anthropic:
|
||||||
# - ENABLE_ANTHROPIC=true
|
# - ENABLE_ANTHROPIC=true
|
||||||
# - LLM_KEY=ANTHROPIC_CLAUDE3.5_SONNET
|
# - LLM_KEY=ANTHROPIC_CLAUDE3.5_SONNET
|
||||||
|
|||||||
@@ -33,6 +33,18 @@
|
|||||||
"running-automations/extract-structured-data"
|
"running-automations/extract-structured-data"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
"group": "Self-Hosted Deployment",
|
||||||
|
"pages": [
|
||||||
|
"self-hosted/overview",
|
||||||
|
"self-hosted/docker",
|
||||||
|
"self-hosted/llm-configuration",
|
||||||
|
"self-hosted/browser",
|
||||||
|
"self-hosted/proxy",
|
||||||
|
"self-hosted/kubernetes",
|
||||||
|
"self-hosted/storage"
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"group": "Multi-Step Automations",
|
"group": "Multi-Step Automations",
|
||||||
"pages": [
|
"pages": [
|
||||||
|
|||||||
262
docs/self-hosted/browser.mdx
Normal file
262
docs/self-hosted/browser.mdx
Normal file
@@ -0,0 +1,262 @@
|
|||||||
|
---
|
||||||
|
title: Browser Configuration
|
||||||
|
subtitle: Configure browser modes, display settings, and external Chrome connections
|
||||||
|
slug: self-hosted/browser
|
||||||
|
---
|
||||||
|
|
||||||
|
Skyvern uses Playwright with Chromium to execute browser automations.
|
||||||
|
|
||||||
|
## Browser modes
|
||||||
|
|
||||||
|
The `BROWSER_TYPE` setting controls how Skyvern runs the browser.
|
||||||
|
|
||||||
|
### Headful (default)
|
||||||
|
|
||||||
|
```bash .env
|
||||||
|
BROWSER_TYPE=chromium-headful
|
||||||
|
```
|
||||||
|
|
||||||
|
The browser runs with a visible window. In Docker, this displays on a virtual framebuffer (Xvfb).
|
||||||
|
|
||||||
|
### Headless
|
||||||
|
|
||||||
|
```bash .env
|
||||||
|
BROWSER_TYPE=chromium-headless
|
||||||
|
```
|
||||||
|
|
||||||
|
The browser runs without any display.
|
||||||
|
|
||||||
|
<Note>
|
||||||
|
Some websites detect and block headless browsers. If you encounter issues with bot detection, try headful mode with a virtual display.
|
||||||
|
</Note>
|
||||||
|
|
||||||
|
### CDP Connect (External Chrome)
|
||||||
|
|
||||||
|
```bash .env
|
||||||
|
BROWSER_TYPE=cdp-connect
|
||||||
|
BROWSER_REMOTE_DEBUGGING_URL=http://host.docker.internal:9222/
|
||||||
|
```
|
||||||
|
|
||||||
|
Connect to an existing Chrome instance running with remote debugging enabled. Useful for:
|
||||||
|
- Using your existing browser profile with saved logins
|
||||||
|
- Debugging with Chrome DevTools
|
||||||
|
- Running automations on a browser with specific extensions
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Setting up CDP Connect
|
||||||
|
|
||||||
|
CDP (Chrome DevTools Protocol) lets Skyvern control an external Chrome browser instead of launching its own.
|
||||||
|
|
||||||
|
### Step 1: Start Chrome with remote debugging
|
||||||
|
|
||||||
|
<CodeGroup>
|
||||||
|
```bash macOS
|
||||||
|
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome \
|
||||||
|
--remote-debugging-port=9222 \
|
||||||
|
--user-data-dir="$HOME/chrome-cdp-profile" \
|
||||||
|
--no-first-run \
|
||||||
|
--no-default-browser-check
|
||||||
|
```
|
||||||
|
|
||||||
|
```powershell Windows
|
||||||
|
"C:\Program Files\Google\Chrome\Application\chrome.exe" `
|
||||||
|
--remote-debugging-port=9222 `
|
||||||
|
--user-data-dir="C:\chrome-cdp-profile" `
|
||||||
|
--no-first-run `
|
||||||
|
--no-default-browser-check
|
||||||
|
```
|
||||||
|
|
||||||
|
```bash Linux
|
||||||
|
google-chrome \
|
||||||
|
--remote-debugging-port=9222 \
|
||||||
|
--user-data-dir="$HOME/chrome-cdp-profile" \
|
||||||
|
--no-first-run \
|
||||||
|
--no-default-browser-check
|
||||||
|
```
|
||||||
|
</CodeGroup>
|
||||||
|
|
||||||
|
The `--user-data-dir` flag creates a separate profile for Skyvern, preserving your main Chrome profile.
|
||||||
|
|
||||||
|
### Step 2: Configure Skyvern
|
||||||
|
|
||||||
|
```bash .env
|
||||||
|
BROWSER_TYPE=cdp-connect
|
||||||
|
BROWSER_REMOTE_DEBUGGING_URL=http://host.docker.internal:9222/
|
||||||
|
```
|
||||||
|
|
||||||
|
When running Skyvern in Docker:
|
||||||
|
|
||||||
|
| Host OS | URL |
|
||||||
|
|---------|-----|
|
||||||
|
| macOS/Windows | `http://host.docker.internal:9222/` |
|
||||||
|
| Linux | `http://172.17.0.1:9222/` |
|
||||||
|
|
||||||
|
### Step 3: Verify connection
|
||||||
|
|
||||||
|
Test that Skyvern can reach Chrome:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl http://localhost:9222/json/version
|
||||||
|
```
|
||||||
|
|
||||||
|
You should see Chrome's version information.
|
||||||
|
|
||||||
|
<Warning>
|
||||||
|
CDP mode exposes Chrome to network access. Only use this in trusted environments and bind to localhost when possible.
|
||||||
|
</Warning>
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Display settings
|
||||||
|
|
||||||
|
Configure how the browser appears to websites. These settings affect geolocation detection and content rendering.
|
||||||
|
|
||||||
|
### Locale and timezone
|
||||||
|
|
||||||
|
```bash .env
|
||||||
|
# Optional, unset by default. Set to match your target region.
|
||||||
|
BROWSER_LOCALE=en-US
|
||||||
|
BROWSER_TIMEZONE=America/New_York
|
||||||
|
```
|
||||||
|
|
||||||
|
Set these to match your target audience or the expected user location. Mismatched locale/timezone can trigger bot detection on some sites.
|
||||||
|
|
||||||
|
Common combinations:
|
||||||
|
|
||||||
|
| Region | BROWSER_LOCALE | BROWSER_TIMEZONE |
|
||||||
|
|--------|----------------|------------------|
|
||||||
|
| US East | `en-US` | `America/New_York` |
|
||||||
|
| US West | `en-US` | `America/Los_Angeles` |
|
||||||
|
| UK | `en-GB` | `Europe/London` |
|
||||||
|
| Germany | `de-DE` | `Europe/Berlin` |
|
||||||
|
| Japan | `ja-JP` | `Asia/Tokyo` |
|
||||||
|
|
||||||
|
### Viewport size
|
||||||
|
|
||||||
|
```bash .env
|
||||||
|
BROWSER_WIDTH=1920
|
||||||
|
BROWSER_HEIGHT=1080
|
||||||
|
```
|
||||||
|
|
||||||
|
The default 1920x1080 works for most sites.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Timeout settings
|
||||||
|
|
||||||
|
Control how long Skyvern waits for various browser operations.
|
||||||
|
|
||||||
|
```bash .env
|
||||||
|
# Time to wait for individual actions (clicks, typing)
|
||||||
|
BROWSER_ACTION_TIMEOUT_MS=5000
|
||||||
|
|
||||||
|
# Time to wait for screenshots to capture
|
||||||
|
BROWSER_SCREENSHOT_TIMEOUT_MS=20000
|
||||||
|
|
||||||
|
# Time to wait for page loads
|
||||||
|
BROWSER_LOADING_TIMEOUT_MS=60000
|
||||||
|
|
||||||
|
# Time to wait for DOM tree analysis
|
||||||
|
BROWSER_SCRAPING_BUILDING_ELEMENT_TREE_TIMEOUT_MS=60000
|
||||||
|
```
|
||||||
|
|
||||||
|
### When to adjust timeouts
|
||||||
|
|
||||||
|
| Symptom | Adjustment |
|
||||||
|
|---------|------------|
|
||||||
|
| Actions fail on slow sites | Increase `BROWSER_ACTION_TIMEOUT_MS` |
|
||||||
|
| Screenshots timeout on complex pages | Increase `BROWSER_SCREENSHOT_TIMEOUT_MS` |
|
||||||
|
| Page load timeouts | Increase `BROWSER_LOADING_TIMEOUT_MS` |
|
||||||
|
| DOM analysis fails on large pages | Increase `BROWSER_SCRAPING_BUILDING_ELEMENT_TREE_TIMEOUT_MS` |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Advanced settings
|
||||||
|
|
||||||
|
### Browser logging
|
||||||
|
|
||||||
|
```bash .env
|
||||||
|
BROWSER_LOGS_ENABLED=true
|
||||||
|
```
|
||||||
|
|
||||||
|
When enabled, browser console logs are captured in artifacts. Useful for debugging JavaScript errors on target sites.
|
||||||
|
|
||||||
|
### Maximum pages
|
||||||
|
|
||||||
|
```bash .env
|
||||||
|
BROWSER_MAX_PAGES_NUMBER=10
|
||||||
|
```
|
||||||
|
|
||||||
|
Limits the number of browser tabs Skyvern can open simultaneously. Increase if your workflows navigate multiple pages in parallel; decrease to reduce memory usage.
|
||||||
|
|
||||||
|
### Chrome policies
|
||||||
|
|
||||||
|
```bash .env
|
||||||
|
BROWSER_POLICY_FILE=/etc/chromium/policies/managed/policies.json
|
||||||
|
```
|
||||||
|
|
||||||
|
Path to a Chrome policy file for enterprise configurations.
|
||||||
|
|
||||||
|
### Video recording path
|
||||||
|
|
||||||
|
```bash .env
|
||||||
|
VIDEO_PATH=./videos
|
||||||
|
```
|
||||||
|
|
||||||
|
Directory where browser session recordings are saved. Recordings are useful for debugging but consume disk space.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Memory considerations
|
||||||
|
|
||||||
|
Browser instances are memory-intensive. These are approximate guidelines. Actual usage depends on page complexity and browser settings.
|
||||||
|
|
||||||
|
| Concurrent tasks | Recommended RAM |
|
||||||
|
|------------------|-----------------|
|
||||||
|
| 1-2 | 4GB |
|
||||||
|
| 3-5 | 8GB |
|
||||||
|
| 6-10 | 16GB |
|
||||||
|
| 10+ | 32GB+ |
|
||||||
|
|
||||||
|
If you experience out-of-memory errors:
|
||||||
|
1. Reduce `BROWSER_MAX_PAGES_NUMBER`
|
||||||
|
2. Use a smaller viewport (`BROWSER_WIDTH`, `BROWSER_HEIGHT`)
|
||||||
|
3. Run in headless mode (`BROWSER_TYPE=chromium-headless`)
|
||||||
|
4. Limit concurrent task execution
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Scaling browsers
|
||||||
|
|
||||||
|
The default Docker Compose setup runs one browser instance inside the Skyvern container. For higher concurrency:
|
||||||
|
|
||||||
|
### Option 1: Vertical scaling
|
||||||
|
|
||||||
|
Add more RAM to your server and increase `MAX_STEPS_PER_RUN` and `BROWSER_MAX_PAGES_NUMBER`.
|
||||||
|
|
||||||
|
### Option 2: Horizontal scaling
|
||||||
|
|
||||||
|
Deploy multiple Skyvern instances behind a load balancer. Each instance runs its own browser. See [Kubernetes Deployment](/self-hosted/kubernetes) for orchestrated scaling.
|
||||||
|
|
||||||
|
### Option 3: External browser pool
|
||||||
|
|
||||||
|
Use a browser pool service like Browserless or your own Playwright grid, then connect via CDP:
|
||||||
|
|
||||||
|
```bash .env
|
||||||
|
BROWSER_TYPE=cdp-connect
|
||||||
|
BROWSER_REMOTE_DEBUGGING_URL=http://browserless:3000/
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Next steps
|
||||||
|
|
||||||
|
<CardGroup cols={2}>
|
||||||
|
<Card title="Proxy Setup" icon="shield" href="/self-hosted/proxy">
|
||||||
|
Configure proxies to avoid bot detection
|
||||||
|
</Card>
|
||||||
|
<Card title="Storage Configuration" icon="hard-drive" href="/self-hosted/storage">
|
||||||
|
Store recordings and artifacts in S3 or Azure Blob
|
||||||
|
</Card>
|
||||||
|
</CardGroup>
|
||||||
390
docs/self-hosted/docker.mdx
Normal file
390
docs/self-hosted/docker.mdx
Normal file
@@ -0,0 +1,390 @@
|
|||||||
|
---
|
||||||
|
title: Docker Setup
|
||||||
|
subtitle: Get Skyvern running in 10 minutes with Docker Compose
|
||||||
|
slug: self-hosted/docker
|
||||||
|
---
|
||||||
|
|
||||||
|
This guide walks you through deploying Skyvern using Docker Compose. By the end, you'll have a working Skyvern instance with the API server, browser, database, and web UI running on your machine.
|
||||||
|
|
||||||
|
## Prerequisites
|
||||||
|
|
||||||
|
- Docker and Docker Compose v2 installed ([Get Docker](https://docs.docker.com/get-docker/)). All commands use `docker compose` (with a space). If you have an older installation, replace with `docker-compose`.
|
||||||
|
- 4GB+ RAM available
|
||||||
|
- An LLM API key (OpenAI, Anthropic, Azure, Gemini, or Bedrock)
|
||||||
|
|
||||||
|
## Quick start
|
||||||
|
|
||||||
|
### 1. Clone the repository
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git clone https://github.com/Skyvern-AI/skyvern.git
|
||||||
|
cd skyvern
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. Configure environment variables
|
||||||
|
|
||||||
|
Copy the example environment file and configure your LLM provider:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cp .env.example .env
|
||||||
|
```
|
||||||
|
|
||||||
|
Open `.env` and set your LLM provider. Here's an example for OpenAI:
|
||||||
|
|
||||||
|
```bash .env
|
||||||
|
ENABLE_OPENAI=true
|
||||||
|
OPENAI_API_KEY=sk-your-api-key-here
|
||||||
|
LLM_KEY=OPENAI_GPT4O
|
||||||
|
```
|
||||||
|
|
||||||
|
For other providers, see [LLM Configuration](/self-hosted/llm-configuration).
|
||||||
|
|
||||||
|
### 3. Configure the frontend
|
||||||
|
|
||||||
|
Copy the frontend environment file:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cp skyvern-frontend/.env.example skyvern-frontend/.env
|
||||||
|
```
|
||||||
|
|
||||||
|
The default values work for local development. You'll update `VITE_SKYVERN_API_KEY` after the first startup.
|
||||||
|
|
||||||
|
### 4. Start the services
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker compose up -d
|
||||||
|
```
|
||||||
|
|
||||||
|
This pulls the images and starts three services:
|
||||||
|
- **postgres**: Database on port 5432 (internal only)
|
||||||
|
- **skyvern**: API server on port 8000
|
||||||
|
- **skyvern-ui**: Web interface on port 8080
|
||||||
|
|
||||||
|
First startup takes 1-2 minutes as it runs database migrations and creates your organization.
|
||||||
|
|
||||||
|
### 5. Get your API key
|
||||||
|
|
||||||
|
Wait for all services to be healthy before continuing:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker compose ps
|
||||||
|
```
|
||||||
|
|
||||||
|
All three services should show `healthy` in the STATUS column. The `skyvern` container runs database migrations and generates credentials on first startup. This takes 1-2 minutes.
|
||||||
|
|
||||||
|
Once healthy, retrieve your API key:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cat .streamlit/secrets.toml
|
||||||
|
```
|
||||||
|
|
||||||
|
This file is auto-generated by Skyvern on first startup. The `.streamlit` path is a legacy artifact. The credentials inside are standard Skyvern API keys.
|
||||||
|
|
||||||
|
You'll see output like:
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[skyvern]
|
||||||
|
configs = [
|
||||||
|
{env = "local", host = "http://skyvern:8000/api/v1", orgs = [{name="Skyvern", cred="eyJhbGciOiJIUzI1..."}]}
|
||||||
|
]
|
||||||
|
```
|
||||||
|
|
||||||
|
The `host` value uses the Docker-internal hostname `skyvern`. From your machine, use `http://localhost:8000` instead. You only need the `cred` value. This is your API key.
|
||||||
|
|
||||||
|
### 6. Update frontend configuration
|
||||||
|
|
||||||
|
Add your API key to the frontend environment:
|
||||||
|
|
||||||
|
```bash skyvern-frontend/.env
|
||||||
|
VITE_SKYVERN_API_KEY=eyJhbGciOiJIUzI1...
|
||||||
|
```
|
||||||
|
|
||||||
|
Restart the UI to pick up the change:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker compose restart skyvern-ui
|
||||||
|
```
|
||||||
|
|
||||||
|
### 7. Verify the installation
|
||||||
|
|
||||||
|
Open [http://localhost:8080](http://localhost:8080) in your browser. You should see the Skyvern dashboard.
|
||||||
|
|
||||||
|
Test the API by listing workflows (should return an empty array on fresh install):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl -s http://localhost:8000/v1/workflows \
|
||||||
|
-H "x-api-key: YOUR_API_KEY_HERE"
|
||||||
|
```
|
||||||
|
|
||||||
|
<Note>
|
||||||
|
The API accepts requests on both `/v1/` and `/api/v1/`. The frontend uses `/api/v1` for backward compatibility. New integrations should use `/v1/`.
|
||||||
|
</Note>
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Understanding the services
|
||||||
|
|
||||||
|
The Docker Compose file defines three services that work together:
|
||||||
|
|
||||||
|
```mermaid
|
||||||
|
flowchart LR
|
||||||
|
UI[skyvern-ui<br/>:8080] --> API[skyvern<br/>:8000]
|
||||||
|
API --> DB[(postgres<br/>:5432)]
|
||||||
|
API --> Browser[Browser<br/>internal]
|
||||||
|
API --> LLM[Your LLM<br/>Provider]
|
||||||
|
```
|
||||||
|
|
||||||
|
| Service | Image | Ports | Purpose |
|
||||||
|
|---------|-------|-------|---------|
|
||||||
|
| `postgres` | postgres:14-alpine | 5432 (internal) | Stores tasks, workflows, credentials, and run history |
|
||||||
|
| `skyvern` | public.ecr.aws/skyvern/skyvern | 8000, 9222 | API server + embedded browser |
|
||||||
|
| `skyvern-ui` | public.ecr.aws/skyvern/skyvern-ui | 8080, 9090 | Web dashboard and artifact server |
|
||||||
|
|
||||||
|
The `skyvern` container includes Playwright with Chromium. The browser runs inside the same container as the API server. No separate browser service is needed.
|
||||||
|
|
||||||
|
### Data volumes
|
||||||
|
|
||||||
|
Docker Compose mounts several directories for persistent storage:
|
||||||
|
|
||||||
|
| Local path | Container path | Contents |
|
||||||
|
|------------|----------------|----------|
|
||||||
|
| `./postgres-data` | `/var/lib/postgresql/data` | Database files |
|
||||||
|
| `./artifacts` | `/data/artifacts` | Extracted data, screenshots |
|
||||||
|
| `./videos` | `/data/videos` | Browser session recordings |
|
||||||
|
| `./har` | `/data/har` | HTTP Archive files for debugging |
|
||||||
|
| `./log` | `/data/log` | Application logs |
|
||||||
|
| `./.streamlit` | `/app/.streamlit` | Generated API credentials |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Environment variables reference
|
||||||
|
|
||||||
|
The `.env` file controls the Skyvern server. Here are the most important variables grouped by purpose.
|
||||||
|
|
||||||
|
### LLM Configuration
|
||||||
|
|
||||||
|
Configure at least one LLM provider. See [LLM Configuration](/self-hosted/llm-configuration) for all providers.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# OpenAI
|
||||||
|
ENABLE_OPENAI=true
|
||||||
|
OPENAI_API_KEY=sk-...
|
||||||
|
LLM_KEY=OPENAI_GPT4O
|
||||||
|
|
||||||
|
# Or Anthropic
|
||||||
|
ENABLE_ANTHROPIC=true
|
||||||
|
ANTHROPIC_API_KEY=sk-ant-...
|
||||||
|
LLM_KEY=ANTHROPIC_CLAUDE3.5_SONNET
|
||||||
|
```
|
||||||
|
|
||||||
|
### Browser settings
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Browser mode: chromium-headful (visible) or chromium-headless (no display)
|
||||||
|
BROWSER_TYPE=chromium-headful
|
||||||
|
|
||||||
|
# Timeout for individual browser actions (milliseconds)
|
||||||
|
BROWSER_ACTION_TIMEOUT_MS=5000
|
||||||
|
|
||||||
|
# Where to save recordings
|
||||||
|
VIDEO_PATH=./videos
|
||||||
|
```
|
||||||
|
|
||||||
|
### Task execution
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Maximum steps before a task times out
|
||||||
|
MAX_STEPS_PER_RUN=50
|
||||||
|
|
||||||
|
# Server port
|
||||||
|
PORT=8000
|
||||||
|
|
||||||
|
# Log verbosity: DEBUG, INFO, WARNING, ERROR
|
||||||
|
LOG_LEVEL=INFO
|
||||||
|
```
|
||||||
|
|
||||||
|
### Database
|
||||||
|
|
||||||
|
The database connection is set in `docker-compose.yml`, not `.env`:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
environment:
|
||||||
|
- DATABASE_STRING=postgresql+psycopg://skyvern:skyvern@postgres:5432/skyvern
|
||||||
|
```
|
||||||
|
|
||||||
|
To use an external database, update this connection string and remove the `postgres` service.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Common operations
|
||||||
|
|
||||||
|
### View logs
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# All services
|
||||||
|
docker compose logs -f
|
||||||
|
|
||||||
|
# Specific service
|
||||||
|
docker compose logs -f skyvern
|
||||||
|
```
|
||||||
|
|
||||||
|
### Restart after configuration changes
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker compose restart skyvern
|
||||||
|
```
|
||||||
|
|
||||||
|
### Stop all services
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker compose down
|
||||||
|
```
|
||||||
|
|
||||||
|
### Update to latest version
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker compose pull
|
||||||
|
docker compose up -d
|
||||||
|
```
|
||||||
|
|
||||||
|
### Reset everything (including database)
|
||||||
|
|
||||||
|
<Warning>
|
||||||
|
This deletes all data including task history, credentials, and recordings.
|
||||||
|
</Warning>
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker compose down -v
|
||||||
|
rm -rf postgres-data artifacts videos har log .streamlit
|
||||||
|
docker compose up -d
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Exposing to the network
|
||||||
|
|
||||||
|
By default, Skyvern only accepts connections from localhost. To expose it on your network:
|
||||||
|
|
||||||
|
### Option 1: Bind to all interfaces
|
||||||
|
|
||||||
|
Edit `docker-compose.yml` to change the port bindings:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
services:
|
||||||
|
skyvern:
|
||||||
|
ports:
|
||||||
|
- "0.0.0.0:8000:8000" # Accept connections from any IP
|
||||||
|
```
|
||||||
|
|
||||||
|
### Option 2: Use a reverse proxy
|
||||||
|
|
||||||
|
For production deployments, put Skyvern behind nginx or Traefik:
|
||||||
|
|
||||||
|
```
|
||||||
|
your-domain.com → nginx → localhost:8080 (UI)
|
||||||
|
api.your-domain.com → nginx → localhost:8000 (API)
|
||||||
|
```
|
||||||
|
|
||||||
|
Update the frontend environment to use your domain:
|
||||||
|
|
||||||
|
```bash skyvern-frontend/.env
|
||||||
|
VITE_API_BASE_URL=https://api.your-domain.com/api/v1
|
||||||
|
VITE_WSS_BASE_URL=wss://api.your-domain.com/api/v1
|
||||||
|
VITE_ARTIFACT_API_BASE_URL=https://artifacts.your-domain.com
|
||||||
|
```
|
||||||
|
|
||||||
|
<Warning>
|
||||||
|
If exposing Skyvern to the internet, add authentication at the reverse proxy layer or use a VPN.
|
||||||
|
</Warning>
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
<Accordion title="Container exits immediately">
|
||||||
|
Check the logs for the failing service:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker compose logs skyvern
|
||||||
|
```
|
||||||
|
|
||||||
|
Common causes:
|
||||||
|
- Missing or invalid LLM API key: look for LLM-related errors in logs
|
||||||
|
- Database connection failed: check if `postgres` service is healthy with `docker compose ps`
|
||||||
|
</Accordion>
|
||||||
|
|
||||||
|
<Accordion title='API returns "Invalid credentials" (403)'>
|
||||||
|
The API key is missing, malformed, or doesn't match the organization. Verify:
|
||||||
|
|
||||||
|
1. The `x-api-key` header is included in your request
|
||||||
|
2. The key matches exactly what's in `.streamlit/secrets.toml`
|
||||||
|
3. No extra whitespace or newlines in the key
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Correct format
|
||||||
|
curl -H "x-api-key: eyJhbGciOiJIUzI1..." http://localhost:8000/v1/workflows
|
||||||
|
```
|
||||||
|
</Accordion>
|
||||||
|
|
||||||
|
<Accordion title='API returns "Could not validate credentials" (403)'>
|
||||||
|
The API key format is invalid (JWT decode failed). This usually means:
|
||||||
|
- The key was truncated or corrupted during copy-paste
|
||||||
|
- You're using an API key from a different Skyvern installation
|
||||||
|
|
||||||
|
Regenerate credentials by resetting the installation:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
rm -rf .streamlit
|
||||||
|
docker compose restart skyvern
|
||||||
|
cat .streamlit/secrets.toml # Get new key
|
||||||
|
```
|
||||||
|
</Accordion>
|
||||||
|
|
||||||
|
<Accordion title="UI shows blank page or connection errors">
|
||||||
|
Check that `skyvern-frontend/.env` has the correct values:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
VITE_API_BASE_URL=http://localhost:8000/api/v1
|
||||||
|
VITE_WSS_BASE_URL=ws://localhost:8000/api/v1
|
||||||
|
VITE_SKYVERN_API_KEY=<your-key-from-secrets.toml>
|
||||||
|
```
|
||||||
|
|
||||||
|
After updating, restart the UI:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker compose restart skyvern-ui
|
||||||
|
```
|
||||||
|
</Accordion>
|
||||||
|
|
||||||
|
<Accordion title='Task fails with "Unknown browser type"'>
|
||||||
|
The `BROWSER_TYPE` environment variable has an invalid value. Valid options:
|
||||||
|
- `chromium-headful`: Browser with visible window (default)
|
||||||
|
- `chromium-headless`: No visible window
|
||||||
|
- `cdp-connect`: Connect to external Chrome
|
||||||
|
</Accordion>
|
||||||
|
|
||||||
|
<Accordion title='Task fails with "Context window exceeded"'>
|
||||||
|
The page content is too large for the LLM. Try:
|
||||||
|
- Simplifying your prompt
|
||||||
|
- Starting from a more specific URL
|
||||||
|
- Using a model with a larger context window
|
||||||
|
</Accordion>
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Next steps
|
||||||
|
|
||||||
|
<CardGroup cols={2}>
|
||||||
|
<Card title="LLM Configuration" icon="microchip" href="/self-hosted/llm-configuration">
|
||||||
|
Configure OpenAI, Anthropic, Azure, Ollama, and other providers
|
||||||
|
</Card>
|
||||||
|
<Card title="Browser Configuration" icon="window" href="/self-hosted/browser">
|
||||||
|
Customize browser settings, locales, and connect to external Chrome
|
||||||
|
</Card>
|
||||||
|
<Card title="Storage Configuration" icon="hard-drive" href="/self-hosted/storage">
|
||||||
|
Store artifacts in S3 or Azure Blob instead of local disk
|
||||||
|
</Card>
|
||||||
|
<Card title="Proxy Setup" icon="shield" href="/self-hosted/proxy">
|
||||||
|
Configure proxies to avoid bot detection
|
||||||
|
</Card>
|
||||||
|
</CardGroup>
|
||||||
441
docs/self-hosted/kubernetes.mdx
Normal file
441
docs/self-hosted/kubernetes.mdx
Normal file
@@ -0,0 +1,441 @@
|
|||||||
|
---
|
||||||
|
title: Kubernetes Deployment
|
||||||
|
subtitle: Deploy Skyvern at scale with Kubernetes manifests
|
||||||
|
slug: self-hosted/kubernetes
|
||||||
|
---
|
||||||
|
|
||||||
|
This guide walks through deploying Skyvern on Kubernetes for production environments. Use this when you need horizontal scaling, high availability, or integration with existing Kubernetes infrastructure.
|
||||||
|
|
||||||
|
<Warning>
|
||||||
|
Do not expose this deployment to the public internet without adding authentication at the ingress layer.
|
||||||
|
</Warning>
|
||||||
|
|
||||||
|
## Prerequisites
|
||||||
|
|
||||||
|
- A running Kubernetes cluster (1.19+)
|
||||||
|
- `kubectl` configured to access your cluster
|
||||||
|
- An ingress controller (the manifests use Traefik, but any controller works)
|
||||||
|
- An LLM API key (OpenAI, Anthropic, Azure, etc.)
|
||||||
|
|
||||||
|
## Architecture overview
|
||||||
|
|
||||||
|
The Kubernetes deployment creates three services:
|
||||||
|
|
||||||
|
```mermaid
|
||||||
|
flowchart LR
|
||||||
|
subgraph Kubernetes Cluster
|
||||||
|
Ingress[Ingress] --> FE[Frontend<br/>:8080]
|
||||||
|
Ingress --> BE[Backend<br/>:8000]
|
||||||
|
Ingress --> Art[Artifacts<br/>:9090]
|
||||||
|
BE --> DB[(PostgreSQL<br/>:5432)]
|
||||||
|
BE --> Browser[Browser<br/>embedded]
|
||||||
|
FE --> Art
|
||||||
|
end
|
||||||
|
|
||||||
|
BE --> LLM[LLM Provider]
|
||||||
|
```
|
||||||
|
|
||||||
|
| Component | Service | Purpose |
|
||||||
|
|-----------|---------|---------|
|
||||||
|
| Backend | `skyvern-backend` | API server + embedded browser |
|
||||||
|
| Frontend | `skyvern-frontend` | Web UI + artifact server |
|
||||||
|
| PostgreSQL | `postgres` | Database for tasks, workflows, credentials |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Quick start
|
||||||
|
|
||||||
|
### 1. Clone the repository
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git clone https://github.com/Skyvern-AI/skyvern.git
|
||||||
|
cd skyvern/kubernetes-deployment
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. Configure backend secrets
|
||||||
|
|
||||||
|
Edit `backend/backend-secrets.yaml` with your LLM provider credentials:
|
||||||
|
|
||||||
|
```yaml backend/backend-secrets.yaml
|
||||||
|
apiVersion: v1
|
||||||
|
kind: Secret
|
||||||
|
metadata:
|
||||||
|
name: skyvern-backend-env
|
||||||
|
namespace: skyvern
|
||||||
|
type: Opaque
|
||||||
|
stringData:
|
||||||
|
ENV: local
|
||||||
|
|
||||||
|
# LLM Configuration - set your provider
|
||||||
|
ENABLE_OPENAI: "true"
|
||||||
|
OPENAI_API_KEY: "sk-your-api-key-here"
|
||||||
|
LLM_KEY: "OPENAI_GPT4O"
|
||||||
|
|
||||||
|
# Database - points to the PostgreSQL service
|
||||||
|
DATABASE_STRING: "postgresql+psycopg://skyvern:skyvern@postgres/skyvern"
|
||||||
|
|
||||||
|
# Browser settings
|
||||||
|
BROWSER_TYPE: "chromium-headless"
|
||||||
|
BROWSER_ACTION_TIMEOUT_MS: "5000"
|
||||||
|
MAX_STEPS_PER_RUN: "50"
|
||||||
|
|
||||||
|
# Server
|
||||||
|
PORT: "8000"
|
||||||
|
LOG_LEVEL: "INFO"
|
||||||
|
```
|
||||||
|
|
||||||
|
For other LLM providers, see [LLM Configuration](/self-hosted/llm-configuration).
|
||||||
|
|
||||||
|
### 3. Configure frontend secrets
|
||||||
|
|
||||||
|
Edit `frontend/frontend-secrets.yaml`:
|
||||||
|
|
||||||
|
```yaml frontend/frontend-secrets.yaml
|
||||||
|
apiVersion: v1
|
||||||
|
kind: Secret
|
||||||
|
metadata:
|
||||||
|
name: skyvern-frontend-env
|
||||||
|
namespace: skyvern
|
||||||
|
type: Opaque
|
||||||
|
stringData:
|
||||||
|
VITE_API_BASE_URL: "http://skyvern.example.com/api/v1"
|
||||||
|
VITE_WSS_BASE_URL: "ws://skyvern.example.com/api/v1"
|
||||||
|
VITE_ARTIFACT_API_BASE_URL: "http://skyvern.example.com/artifacts"
|
||||||
|
VITE_SKYVERN_API_KEY: "" # Leave empty for initial deploy
|
||||||
|
```
|
||||||
|
|
||||||
|
Replace `skyvern.example.com` with your actual domain.
|
||||||
|
|
||||||
|
### 4. Configure ingress
|
||||||
|
|
||||||
|
Edit `ingress.yaml` with your domain and TLS settings:
|
||||||
|
|
||||||
|
```yaml ingress.yaml
|
||||||
|
apiVersion: networking.k8s.io/v1
|
||||||
|
kind: Ingress
|
||||||
|
metadata:
|
||||||
|
name: skyvern-ingress
|
||||||
|
namespace: skyvern
|
||||||
|
annotations:
|
||||||
|
# Adjust for your ingress controller
|
||||||
|
traefik.ingress.kubernetes.io/router.entrypoints: websecure
|
||||||
|
spec:
|
||||||
|
ingressClassName: traefik # Change to nginx, kong, etc.
|
||||||
|
rules:
|
||||||
|
- host: skyvern.example.com # Your domain
|
||||||
|
http:
|
||||||
|
paths:
|
||||||
|
- path: /api
|
||||||
|
pathType: Prefix
|
||||||
|
backend:
|
||||||
|
service:
|
||||||
|
name: skyvern-backend
|
||||||
|
port:
|
||||||
|
number: 8000
|
||||||
|
- path: /artifacts
|
||||||
|
pathType: Prefix
|
||||||
|
backend:
|
||||||
|
service:
|
||||||
|
name: skyvern-frontend
|
||||||
|
port:
|
||||||
|
number: 9090
|
||||||
|
- path: /
|
||||||
|
pathType: Prefix
|
||||||
|
backend:
|
||||||
|
service:
|
||||||
|
name: skyvern-frontend
|
||||||
|
port:
|
||||||
|
number: 8080
|
||||||
|
```
|
||||||
|
|
||||||
|
### 5. Deploy
|
||||||
|
|
||||||
|
Run the deployment script:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
chmod +x k8s-deploy.sh
|
||||||
|
./k8s-deploy.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
This applies manifests in order:
|
||||||
|
1. Namespace
|
||||||
|
2. PostgreSQL (secrets, storage, deployment, service)
|
||||||
|
3. Backend (secrets, deployment, service)
|
||||||
|
4. Frontend (secrets, deployment, service)
|
||||||
|
5. Ingress
|
||||||
|
|
||||||
|
### 6. Verify deployment
|
||||||
|
|
||||||
|
Check that all pods are running:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kubectl get pods -n skyvern
|
||||||
|
```
|
||||||
|
|
||||||
|
Expected output:
|
||||||
|
|
||||||
|
```
|
||||||
|
NAME READY STATUS RESTARTS AGE
|
||||||
|
postgres-xxx 1/1 Running 0 2m
|
||||||
|
skyvern-backend-xxx 1/1 Running 0 1m
|
||||||
|
skyvern-frontend-xxx 1/1 Running 0 30s
|
||||||
|
```
|
||||||
|
|
||||||
|
The backend pod takes 1-2 minutes to become ready as it runs database migrations.
|
||||||
|
|
||||||
|
### 7. Get your API key
|
||||||
|
|
||||||
|
Wait for the backend pod to show `1/1` in the `READY` column of `kubectl get pods -n skyvern` before running this command. The API key file is generated during startup and won't exist until the pod is ready.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kubectl exec -n skyvern deployment/skyvern-backend -- cat /app/.streamlit/secrets.toml
|
||||||
|
```
|
||||||
|
|
||||||
|
Copy the `cred` value and update `frontend/frontend-secrets.yaml`:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
VITE_SKYVERN_API_KEY: "eyJhbGciOiJIUzI1..."
|
||||||
|
```
|
||||||
|
|
||||||
|
Reapply the frontend secrets and restart:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kubectl apply -f frontend/frontend-secrets.yaml -n skyvern
|
||||||
|
kubectl rollout restart deployment/skyvern-frontend -n skyvern
|
||||||
|
```
|
||||||
|
|
||||||
|
### 8. Access the UI
|
||||||
|
|
||||||
|
Navigate to your configured domain (e.g., `https://skyvern.example.com`). You should see the Skyvern dashboard.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Manifest structure
|
||||||
|
|
||||||
|
```
|
||||||
|
kubernetes-deployment/
|
||||||
|
├── namespace.yaml # Creates 'skyvern' namespace
|
||||||
|
├── k8s-deploy.sh # Deployment script
|
||||||
|
├── ingress.yaml # Ingress configuration
|
||||||
|
├── backend/
|
||||||
|
│ ├── backend-secrets.yaml # Environment variables
|
||||||
|
│ ├── backend-deployment.yaml # Pod spec
|
||||||
|
│ └── backend-service.yaml # ClusterIP service
|
||||||
|
├── frontend/
|
||||||
|
│ ├── frontend-secrets.yaml # Environment variables
|
||||||
|
│ ├── frontend-deployment.yaml # Pod spec
|
||||||
|
│ └── frontend-service.yaml # ClusterIP service
|
||||||
|
└── postgres/
|
||||||
|
├── postgres-secrets.yaml # Database credentials
|
||||||
|
├── postgres-storage.yaml # PersistentVolumeClaim
|
||||||
|
├── postgres-deployment.yaml # Pod spec
|
||||||
|
└── postgres-service.yaml # ClusterIP service
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Storage configuration
|
||||||
|
|
||||||
|
By default, the manifests use `hostPath` volumes. This works for single-node clusters but isn't suitable for multi-node production deployments.
|
||||||
|
|
||||||
|
### Using PersistentVolumeClaims
|
||||||
|
|
||||||
|
For production, replace `hostPath` with PVCs. Edit `backend/backend-deployment.yaml`:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
volumes:
|
||||||
|
- name: artifacts
|
||||||
|
persistentVolumeClaim:
|
||||||
|
claimName: skyvern-artifacts-pvc
|
||||||
|
- name: videos
|
||||||
|
persistentVolumeClaim:
|
||||||
|
claimName: skyvern-videos-pvc
|
||||||
|
```
|
||||||
|
|
||||||
|
Create the PVCs:
|
||||||
|
|
||||||
|
```yaml skyvern-storage.yaml
|
||||||
|
apiVersion: v1
|
||||||
|
kind: PersistentVolumeClaim
|
||||||
|
metadata:
|
||||||
|
name: skyvern-artifacts-pvc
|
||||||
|
namespace: skyvern
|
||||||
|
spec:
|
||||||
|
accessModes:
|
||||||
|
- ReadWriteOnce
|
||||||
|
resources:
|
||||||
|
requests:
|
||||||
|
storage: 50Gi
|
||||||
|
---
|
||||||
|
apiVersion: v1
|
||||||
|
kind: PersistentVolumeClaim
|
||||||
|
metadata:
|
||||||
|
name: skyvern-videos-pvc
|
||||||
|
namespace: skyvern
|
||||||
|
spec:
|
||||||
|
accessModes:
|
||||||
|
- ReadWriteOnce
|
||||||
|
resources:
|
||||||
|
requests:
|
||||||
|
storage: 100Gi
|
||||||
|
```
|
||||||
|
|
||||||
|
### Using S3 or Azure Blob
|
||||||
|
|
||||||
|
For cloud storage, configure the backend environment variables instead of mounting volumes. See [Storage Configuration](/self-hosted/storage).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Scaling
|
||||||
|
|
||||||
|
### Horizontal scaling
|
||||||
|
|
||||||
|
To run multiple backend instances, increase the replica count:
|
||||||
|
|
||||||
|
```yaml backend/backend-deployment.yaml
|
||||||
|
spec:
|
||||||
|
replicas: 3 # Run 3 backend pods
|
||||||
|
```
|
||||||
|
|
||||||
|
Each pod runs its own browser instance. Tasks are distributed across pods.
|
||||||
|
|
||||||
|
<Note>
|
||||||
|
When scaling horizontally, ensure your storage backend supports concurrent access (S3, Azure Blob, or ReadWriteMany PVCs). Local storage with ReadWriteOnce PVCs won't work across multiple pods.
|
||||||
|
</Note>
|
||||||
|
|
||||||
|
### Resource limits
|
||||||
|
|
||||||
|
Add resource limits to prevent pods from consuming excessive resources:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
containers:
|
||||||
|
- name: skyvern-backend
|
||||||
|
resources:
|
||||||
|
requests:
|
||||||
|
memory: "2Gi"
|
||||||
|
cpu: "500m"
|
||||||
|
limits:
|
||||||
|
memory: "4Gi"
|
||||||
|
cpu: "2000m"
|
||||||
|
```
|
||||||
|
|
||||||
|
Browser instances need significant memory. Start with 2GB minimum per pod.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## TLS configuration
|
||||||
|
|
||||||
|
To enable HTTPS, uncomment the TLS section in `ingress.yaml`:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
spec:
|
||||||
|
tls:
|
||||||
|
- hosts:
|
||||||
|
- skyvern.example.com
|
||||||
|
secretName: skyvern-tls-secret
|
||||||
|
```
|
||||||
|
|
||||||
|
Create the TLS secret:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kubectl create secret tls skyvern-tls-secret \
|
||||||
|
--cert=path/to/tls.crt \
|
||||||
|
--key=path/to/tls.key \
|
||||||
|
-n skyvern
|
||||||
|
```
|
||||||
|
|
||||||
|
Or use cert-manager for automatic certificate management.
|
||||||
|
|
||||||
|
Update frontend secrets to use `https` and `wss`:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
VITE_API_BASE_URL: "https://skyvern.example.com/api/v1"
|
||||||
|
VITE_WSS_BASE_URL: "wss://skyvern.example.com/api/v1"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Using an external database
|
||||||
|
|
||||||
|
For production, consider using a managed PostgreSQL service (RDS, Cloud SQL, Azure Database).
|
||||||
|
|
||||||
|
1. Remove the `postgres/` manifests from the deployment
|
||||||
|
2. Update `backend/backend-secrets.yaml`:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
DATABASE_STRING: "postgresql+psycopg://user:password@your-db-host:5432/skyvern"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### Pods stuck in Pending
|
||||||
|
|
||||||
|
Check for resource constraints:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kubectl describe pod -n skyvern <pod-name>
|
||||||
|
```
|
||||||
|
|
||||||
|
Common causes:
|
||||||
|
- Insufficient node resources
|
||||||
|
- PersistentVolume not available
|
||||||
|
- Image pull errors
|
||||||
|
|
||||||
|
### Backend crashes on startup
|
||||||
|
|
||||||
|
Check the logs:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kubectl logs -n skyvern deployment/skyvern-backend
|
||||||
|
```
|
||||||
|
|
||||||
|
Common causes:
|
||||||
|
- Invalid LLM API key
|
||||||
|
- Database connection failed
|
||||||
|
- Missing environment variables
|
||||||
|
|
||||||
|
### Frontend shows "Unauthorized"
|
||||||
|
|
||||||
|
The API key in frontend secrets doesn't match the generated key. Re-copy it from the backend pod.
|
||||||
|
|
||||||
|
### Ingress not routing correctly
|
||||||
|
|
||||||
|
Verify your ingress controller is running and the ingress resource is configured:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kubectl get ingress -n skyvern
|
||||||
|
kubectl describe ingress skyvern-ingress -n skyvern
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Cleanup
|
||||||
|
|
||||||
|
To remove the entire deployment:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kubectl delete namespace skyvern
|
||||||
|
```
|
||||||
|
|
||||||
|
This removes all resources in the `skyvern` namespace.
|
||||||
|
|
||||||
|
To clean up host storage (if using hostPath):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
rm -rf /data/artifacts /data/videos /data/har /data/log /app/.streamlit
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Next steps
|
||||||
|
|
||||||
|
<CardGroup cols={2}>
|
||||||
|
<Card title="Storage Configuration" icon="hard-drive" href="/self-hosted/storage">
|
||||||
|
Configure S3 or Azure Blob for artifact storage
|
||||||
|
</Card>
|
||||||
|
<Card title="LLM Configuration" icon="microchip" href="/self-hosted/llm-configuration">
|
||||||
|
Configure additional LLM providers
|
||||||
|
</Card>
|
||||||
|
</CardGroup>
|
||||||
336
docs/self-hosted/llm-configuration.mdx
Normal file
336
docs/self-hosted/llm-configuration.mdx
Normal file
@@ -0,0 +1,336 @@
|
|||||||
|
---
|
||||||
|
title: LLM Configuration
|
||||||
|
subtitle: Connect your preferred language model provider
|
||||||
|
slug: self-hosted/llm-configuration
|
||||||
|
---
|
||||||
|
|
||||||
|
Skyvern uses LLMs to analyze screenshots and decide what actions to take. You'll need to configure at least one LLM provider before running tasks.
|
||||||
|
|
||||||
|
## How Skyvern uses LLMs
|
||||||
|
|
||||||
|
Skyvern makes multiple LLM calls per task step:
|
||||||
|
1. **Screenshot analysis**: Identify interactive elements on the page
|
||||||
|
2. **Action planning**: Decide what to click, type, or extract
|
||||||
|
3. **Result extraction**: Parse data from the page into structured output
|
||||||
|
|
||||||
|
A task that runs for 10 steps makes roughly 30+ LLM calls. Choose your provider and model tier with this in mind.
|
||||||
|
|
||||||
|
For most deployments, configure a single provider using `LLM_KEY`. Skyvern also supports a `SECONDARY_LLM_KEY` for lighter tasks to reduce costs.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## OpenAI
|
||||||
|
|
||||||
|
The most common choice. Requires an API key from [platform.openai.com](https://platform.openai.com/).
|
||||||
|
|
||||||
|
```bash .env
|
||||||
|
ENABLE_OPENAI=true
|
||||||
|
OPENAI_API_KEY=sk-...
|
||||||
|
LLM_KEY=OPENAI_GPT4O
|
||||||
|
```
|
||||||
|
|
||||||
|
### Available models
|
||||||
|
|
||||||
|
| LLM_KEY | Model | Notes |
|
||||||
|
|---------|-------|-------|
|
||||||
|
| `OPENAI_GPT4O` | gpt-4o | Recommended for most use cases |
|
||||||
|
| `OPENAI_GPT4O_MINI` | gpt-4o-mini | Cheaper, less capable |
|
||||||
|
| `OPENAI_GPT4_1` | gpt-4.1 | Latest GPT-4 family |
|
||||||
|
| `OPENAI_GPT4_1_MINI` | gpt-4.1-mini | Cheaper GPT-4.1 variant |
|
||||||
|
| `OPENAI_O3` | o3 | Reasoning model |
|
||||||
|
| `OPENAI_O3_MINI` | o3-mini | Cheaper reasoning model |
|
||||||
|
| `OPENAI_GPT4_TURBO` | gpt-4-turbo | Previous generation |
|
||||||
|
| `OPENAI_GPT4V` | gpt-4-turbo | Legacy alias for gpt-4-turbo |
|
||||||
|
|
||||||
|
### Optional settings
|
||||||
|
|
||||||
|
```bash .env
|
||||||
|
# Use a custom API endpoint (for proxies or compatible services)
|
||||||
|
OPENAI_API_BASE=https://your-proxy.com/v1
|
||||||
|
|
||||||
|
# Specify organization ID
|
||||||
|
OPENAI_ORGANIZATION=org-...
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Anthropic
|
||||||
|
|
||||||
|
Claude models from [anthropic.com](https://www.anthropic.com/).
|
||||||
|
|
||||||
|
```bash .env
|
||||||
|
ENABLE_ANTHROPIC=true
|
||||||
|
ANTHROPIC_API_KEY=sk-ant-...
|
||||||
|
LLM_KEY=ANTHROPIC_CLAUDE3.5_SONNET
|
||||||
|
```
|
||||||
|
|
||||||
|
### Available models
|
||||||
|
|
||||||
|
| LLM_KEY | Model | Notes |
|
||||||
|
|---------|-------|-------|
|
||||||
|
| `ANTHROPIC_CLAUDE4.5_SONNET` | claude-4.5-sonnet | Latest Sonnet |
|
||||||
|
| `ANTHROPIC_CLAUDE4.5_OPUS` | claude-4.5-opus | Most capable |
|
||||||
|
| `ANTHROPIC_CLAUDE4_SONNET` | claude-4-sonnet | Claude 4 |
|
||||||
|
| `ANTHROPIC_CLAUDE4_OPUS` | claude-4-opus | Claude 4 Opus |
|
||||||
|
| `ANTHROPIC_CLAUDE3.7_SONNET` | claude-3-7-sonnet | Previous generation |
|
||||||
|
| `ANTHROPIC_CLAUDE3.5_SONNET` | claude-3-5-sonnet | Previous generation |
|
||||||
|
| `ANTHROPIC_CLAUDE3.5_HAIKU` | claude-3-5-haiku | Cheap and fast |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Azure OpenAI
|
||||||
|
|
||||||
|
Microsoft-hosted OpenAI models. Requires an Azure subscription with OpenAI service provisioned.
|
||||||
|
|
||||||
|
```bash .env
|
||||||
|
ENABLE_AZURE=true
|
||||||
|
LLM_KEY=AZURE_OPENAI
|
||||||
|
AZURE_DEPLOYMENT=your-deployment-name
|
||||||
|
AZURE_API_KEY=your-azure-api-key
|
||||||
|
AZURE_API_BASE=https://your-resource.openai.azure.com/
|
||||||
|
AZURE_API_VERSION=2024-08-01-preview
|
||||||
|
```
|
||||||
|
|
||||||
|
### Setup steps
|
||||||
|
|
||||||
|
1. Create an Azure OpenAI resource in the [Azure Portal](https://portal.azure.com)
|
||||||
|
2. Open the Azure AI Foundry portal from your resource's overview page
|
||||||
|
3. Go to **Shared Resources** → **Deployments**
|
||||||
|
4. Click **Deploy Model** → **Deploy Base Model** → select GPT-4o or GPT-4
|
||||||
|
5. Note the **Deployment Name**. Use this for `AZURE_DEPLOYMENT`
|
||||||
|
6. Copy your API key and endpoint from the Azure Portal
|
||||||
|
|
||||||
|
<Note>
|
||||||
|
The `AZURE_DEPLOYMENT` is the name you chose when deploying the model, not the model name itself.
|
||||||
|
</Note>
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Google Gemini
|
||||||
|
|
||||||
|
Gemini models through [Vertex AI](https://cloud.google.com/vertex-ai). Requires a GCP project with Vertex AI enabled.
|
||||||
|
|
||||||
|
```bash .env
|
||||||
|
ENABLE_VERTEX_AI=true
|
||||||
|
LLM_KEY=VERTEX_GEMINI_3.0_FLASH
|
||||||
|
GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json
|
||||||
|
GCP_PROJECT_ID=your-gcp-project-id
|
||||||
|
GCP_REGION=us-central1
|
||||||
|
```
|
||||||
|
|
||||||
|
### Setup steps
|
||||||
|
|
||||||
|
1. Create a [GCP project](https://console.cloud.google.com/) with billing enabled
|
||||||
|
2. Enable the **Vertex AI API** in your project
|
||||||
|
3. Create a service account with the **Vertex AI User** role
|
||||||
|
4. Download the service account JSON key file
|
||||||
|
5. Set `GOOGLE_APPLICATION_CREDENTIALS` to the path of that file
|
||||||
|
|
||||||
|
### Available models
|
||||||
|
|
||||||
|
| LLM_KEY | Model | Notes |
|
||||||
|
|---------|-------|-------|
|
||||||
|
| `VERTEX_GEMINI_3.0_FLASH` | gemini-3-flash-preview | Recommended |
|
||||||
|
| `VERTEX_GEMINI_2.5_PRO` | gemini-2.5-pro | Stable |
|
||||||
|
| `VERTEX_GEMINI_2.5_FLASH` | gemini-2.5-flash | Cheaper, faster |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Amazon Bedrock
|
||||||
|
|
||||||
|
Run Anthropic Claude through your AWS account.
|
||||||
|
|
||||||
|
```bash .env
|
||||||
|
ENABLE_BEDROCK=true
|
||||||
|
LLM_KEY=BEDROCK_ANTHROPIC_CLAUDE3.5_SONNET
|
||||||
|
AWS_REGION=us-west-2
|
||||||
|
AWS_ACCESS_KEY_ID=AKIA...
|
||||||
|
AWS_SECRET_ACCESS_KEY=...
|
||||||
|
```
|
||||||
|
|
||||||
|
### Setup steps
|
||||||
|
|
||||||
|
1. Create an IAM user with `AmazonBedrockFullAccess` policy
|
||||||
|
2. Generate access keys for the IAM user
|
||||||
|
3. In the [Bedrock console](https://console.aws.amazon.com/bedrock/), go to **Model Access**
|
||||||
|
4. Enable access to Claude 3.5 Sonnet
|
||||||
|
|
||||||
|
### Available models
|
||||||
|
|
||||||
|
| LLM_KEY | Model |
|
||||||
|
|---------|-------|
|
||||||
|
| `BEDROCK_ANTHROPIC_CLAUDE3.5_SONNET` | Claude 3.5 Sonnet v2 |
|
||||||
|
| `BEDROCK_ANTHROPIC_CLAUDE3.5_SONNET_V1` | Claude 3.5 Sonnet v1 |
|
||||||
|
| `BEDROCK_ANTHROPIC_CLAUDE3.7_SONNET_INFERENCE_PROFILE` | Claude 3.7 Sonnet (cross-region) |
|
||||||
|
| `BEDROCK_ANTHROPIC_CLAUDE4_SONNET_INFERENCE_PROFILE` | Claude 4 Sonnet (cross-region) |
|
||||||
|
| `BEDROCK_ANTHROPIC_CLAUDE4.5_SONNET_INFERENCE_PROFILE` | Claude 4.5 Sonnet (cross-region) |
|
||||||
|
|
||||||
|
<Note>
|
||||||
|
Bedrock inference profile keys (`*_INFERENCE_PROFILE`) use cross-region inference and require `AWS_REGION` only. No access keys needed if running on an IAM-authenticated instance.
|
||||||
|
</Note>
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Ollama (Local Models)
|
||||||
|
|
||||||
|
Run open-source models locally with [Ollama](https://ollama.ai/). No API costs, but requires sufficient local compute.
|
||||||
|
|
||||||
|
```bash .env
|
||||||
|
ENABLE_OLLAMA=true
|
||||||
|
LLM_KEY=OLLAMA
|
||||||
|
OLLAMA_MODEL=llama3.1
|
||||||
|
OLLAMA_SERVER_URL=http://host.docker.internal:11434
|
||||||
|
OLLAMA_SUPPORTS_VISION=false
|
||||||
|
```
|
||||||
|
|
||||||
|
### Setup steps
|
||||||
|
|
||||||
|
1. [Install Ollama](https://ollama.ai/download)
|
||||||
|
2. Pull a model: `ollama pull llama3.1`
|
||||||
|
3. Start Ollama: `ollama serve`
|
||||||
|
4. Configure Skyvern to connect
|
||||||
|
|
||||||
|
<Warning>
|
||||||
|
Most Ollama models don't support vision. Set `OLLAMA_SUPPORTS_VISION=false`. Without vision, Skyvern relies on DOM analysis instead of screenshot analysis, which may reduce accuracy on complex pages.
|
||||||
|
</Warning>
|
||||||
|
|
||||||
|
### Docker networking
|
||||||
|
|
||||||
|
When running Skyvern in Docker and Ollama on the host:
|
||||||
|
|
||||||
|
| Host OS | OLLAMA_SERVER_URL |
|
||||||
|
|---------|-------------------|
|
||||||
|
| macOS/Windows | `http://host.docker.internal:11434` |
|
||||||
|
| Linux | `http://172.17.0.1:11434` (Docker bridge IP) |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## OpenAI-Compatible Endpoints
|
||||||
|
|
||||||
|
Connect to any service that implements the OpenAI API format, including LiteLLM, LocalAI, vLLM, and text-generation-inference.
|
||||||
|
|
||||||
|
```bash .env
|
||||||
|
ENABLE_OPENAI_COMPATIBLE=true
|
||||||
|
OPENAI_COMPATIBLE_MODEL_NAME=llama3.1
|
||||||
|
OPENAI_COMPATIBLE_API_KEY=sk-test
|
||||||
|
OPENAI_COMPATIBLE_API_BASE=http://localhost:4000/v1
|
||||||
|
LLM_KEY=OPENAI_COMPATIBLE
|
||||||
|
```
|
||||||
|
|
||||||
|
This is useful for:
|
||||||
|
- Running local models with a unified API
|
||||||
|
- Using LiteLLM as a proxy to switch between providers
|
||||||
|
- Connecting to self-hosted inference servers
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## OpenRouter
|
||||||
|
|
||||||
|
Access multiple models through a single API at [openrouter.ai](https://openrouter.ai/).
|
||||||
|
|
||||||
|
```bash .env
|
||||||
|
ENABLE_OPENROUTER=true
|
||||||
|
LLM_KEY=OPENROUTER
|
||||||
|
OPENROUTER_API_KEY=sk-or-...
|
||||||
|
OPENROUTER_MODEL=mistralai/mistral-small-3.1-24b-instruct
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Groq
|
||||||
|
|
||||||
|
Inference on open-source models at [groq.com](https://groq.com/).
|
||||||
|
|
||||||
|
```bash .env
|
||||||
|
ENABLE_GROQ=true
|
||||||
|
LLM_KEY=GROQ
|
||||||
|
GROQ_API_KEY=gsk_...
|
||||||
|
GROQ_MODEL=llama-3.1-8b-instant
|
||||||
|
```
|
||||||
|
|
||||||
|
<Note>
|
||||||
|
Groq specializes in fast inference for open-source models. Response times are typically much faster than other providers, but model selection is limited.
|
||||||
|
</Note>
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Using multiple models
|
||||||
|
|
||||||
|
### Primary and secondary models
|
||||||
|
|
||||||
|
Configure a cheaper model for lightweight operations:
|
||||||
|
|
||||||
|
```bash .env
|
||||||
|
# Main model for complex decisions
|
||||||
|
LLM_KEY=OPENAI_GPT4O
|
||||||
|
|
||||||
|
# Cheaper model for simple tasks like dropdown selection
|
||||||
|
SECONDARY_LLM_KEY=OPENAI_GPT4O_MINI
|
||||||
|
```
|
||||||
|
|
||||||
|
### Task-specific models
|
||||||
|
|
||||||
|
For fine-grained control, you can override models for specific operations:
|
||||||
|
|
||||||
|
```bash .env
|
||||||
|
# Model for data extraction from pages (defaults to LLM_KEY if not set)
|
||||||
|
EXTRACTION_LLM_KEY=ANTHROPIC_CLAUDE3.5_SONNET
|
||||||
|
|
||||||
|
# Model for generating code/scripts in code blocks (defaults to LLM_KEY if not set)
|
||||||
|
SCRIPT_GENERATION_LLM_KEY=OPENAI_GPT4O
|
||||||
|
```
|
||||||
|
|
||||||
|
Most deployments don't need task-specific models. Start with `LLM_KEY` and `SECONDARY_LLM_KEY`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### "To enable svg shape conversion, please set the Secondary LLM key"
|
||||||
|
|
||||||
|
Some operations require a secondary model. Set `SECONDARY_LLM_KEY` in your environment:
|
||||||
|
|
||||||
|
```bash .env
|
||||||
|
SECONDARY_LLM_KEY=OPENAI_GPT4O_MINI
|
||||||
|
```
|
||||||
|
|
||||||
|
### "Context window exceeded"
|
||||||
|
|
||||||
|
The page content is too large for the model's context window. Options:
|
||||||
|
- Use a model with a larger context (GPT-4o supports 128k tokens)
|
||||||
|
- Simplify your prompt to require less page analysis
|
||||||
|
- Start from a more specific URL with less content
|
||||||
|
|
||||||
|
### "LLM caller not found"
|
||||||
|
|
||||||
|
The configured `LLM_KEY` doesn't match any enabled provider. Verify:
|
||||||
|
1. The provider is enabled (`ENABLE_OPENAI=true`, etc.)
|
||||||
|
2. The `LLM_KEY` value matches a supported model name exactly
|
||||||
|
3. Model names are case-sensitive: `OPENAI_GPT4O` not `openai_gpt4o`
|
||||||
|
|
||||||
|
### Container logs show authentication errors
|
||||||
|
|
||||||
|
Check your API key configuration:
|
||||||
|
- Ensure the key is set correctly without extra whitespace
|
||||||
|
- Verify the key hasn't expired or been revoked
|
||||||
|
- For Azure, ensure `AZURE_API_BASE` includes the full URL with `https://`
|
||||||
|
|
||||||
|
### Slow response times
|
||||||
|
|
||||||
|
LLM calls typically take 2-10 seconds. Longer times may indicate:
|
||||||
|
- Network latency to the provider
|
||||||
|
- Rate limiting (the provider may be throttling requests)
|
||||||
|
- For Ollama, insufficient local compute resources
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Next steps
|
||||||
|
|
||||||
|
<CardGroup cols={2}>
|
||||||
|
<Card title="Browser Configuration" icon="window" href="/self-hosted/browser">
|
||||||
|
Configure browser modes, locales, and display settings
|
||||||
|
</Card>
|
||||||
|
<Card title="Docker Setup" icon="docker" href="/self-hosted/docker">
|
||||||
|
Return to the main Docker setup guide
|
||||||
|
</Card>
|
||||||
|
</CardGroup>
|
||||||
101
docs/self-hosted/overview.mdx
Normal file
101
docs/self-hosted/overview.mdx
Normal file
@@ -0,0 +1,101 @@
|
|||||||
|
---
|
||||||
|
title: Self-Hosted Overview
|
||||||
|
subtitle: Run Skyvern on your own infrastructure
|
||||||
|
slug: self-hosted/overview
|
||||||
|
---
|
||||||
|
|
||||||
|
Self-hosted Skyvern runs entirely on your infrastructure: your servers, your browsers, your LLM API keys. This guide helps you decide if self-hosting fits your needs and which deployment method to choose.
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
Self-hosted Skyvern has four components running on your infrastructure:
|
||||||
|
|
||||||
|
```mermaid
|
||||||
|
flowchart LR
|
||||||
|
subgraph Your Infrastructure
|
||||||
|
API[Skyvern API Server]
|
||||||
|
DB[(PostgreSQL)]
|
||||||
|
Browser[Browser Container]
|
||||||
|
end
|
||||||
|
|
||||||
|
LLM[LLM Provider]
|
||||||
|
Sites[Target Websites]
|
||||||
|
|
||||||
|
API <--> DB
|
||||||
|
API <--> Browser
|
||||||
|
API <--> LLM
|
||||||
|
Browser <--> Sites
|
||||||
|
```
|
||||||
|
|
||||||
|
| Component | Role |
|
||||||
|
|-----------|------|
|
||||||
|
| **Skyvern API Server** | Orchestrates tasks, processes LLM responses, stores results |
|
||||||
|
| **PostgreSQL** | Stores task history, workflows, credentials, and organization data |
|
||||||
|
| **Browser Container** | Playwright-managed Chromium that executes the actual web automation |
|
||||||
|
| **LLM Provider** | Analyzes screenshots and determines actions. You provide the API key (OpenAI, Anthropic, Azure, or local via Ollama) |
|
||||||
|
|
||||||
|
### How a task executes
|
||||||
|
|
||||||
|
Skyvern runs a perception-action loop for each task step:
|
||||||
|
|
||||||
|
1. **Screenshot**: The browser captures the current page state
|
||||||
|
2. **Analyze**: The screenshot is sent to your LLM, which identifies interactive elements and decides the next action
|
||||||
|
3. **Execute**: Skyvern performs the action in the browser (click, type, scroll, extract data)
|
||||||
|
4. **Repeat**: Steps 1-3 loop until the task goal is met or the step limit (`MAX_STEPS_PER_RUN`) is reached
|
||||||
|
|
||||||
|
This loop is why LLM choice and browser configuration are the two most impactful self-hosting decisions. They affect every task step.
|
||||||
|
|
||||||
|
## What changes from Cloud
|
||||||
|
|
||||||
|
| You gain | You manage |
|
||||||
|
|----------|------------|
|
||||||
|
| Full data control: browser sessions and results stay on your network | Infrastructure: servers, scaling, uptime |
|
||||||
|
| Any LLM provider, including local models via Ollama | LLM API costs: pay your provider directly |
|
||||||
|
| No per-task pricing | Proxies: bring your own provider |
|
||||||
|
| Full access to browser configuration and extensions | Software updates: pull new Docker images manually |
|
||||||
|
| Deploy in air-gapped or restricted networks | Database backups and maintenance |
|
||||||
|
|
||||||
|
<Note>
|
||||||
|
The most significant operational difference is **proxies**. Skyvern Cloud routes browser traffic through managed residential proxies to avoid bot detection. Self-hosted deployments need you to configure your own proxy provider.
|
||||||
|
</Note>
|
||||||
|
|
||||||
|
## Prerequisites
|
||||||
|
|
||||||
|
Before deploying, ensure you have:
|
||||||
|
|
||||||
|
<Steps>
|
||||||
|
<Step title="Docker and Docker Compose">
|
||||||
|
Required for containerized deployment. [Install Docker](https://docs.docker.com/get-docker/)
|
||||||
|
</Step>
|
||||||
|
<Step title="4GB+ RAM">
|
||||||
|
Browser instances are memory-intensive. Production deployments benefit from 8GB+.
|
||||||
|
</Step>
|
||||||
|
<Step title="LLM API key">
|
||||||
|
From OpenAI, Anthropic, Azure OpenAI, Google Gemini, or AWS Bedrock. Alternatively, run local models with Ollama.
|
||||||
|
</Step>
|
||||||
|
<Step title="Proxy provider (recommended)">
|
||||||
|
For automating external websites at scale. Not required for internal tools or development.
|
||||||
|
</Step>
|
||||||
|
</Steps>
|
||||||
|
|
||||||
|
PostgreSQL 14+ is included in the Docker Compose setup. If you prefer an external database, you can configure `DATABASE_STRING` to point to your own instance.
|
||||||
|
|
||||||
|
## Choose your deployment method
|
||||||
|
|
||||||
|
| Method | Best for |
|
||||||
|
|--------|----------|
|
||||||
|
| **Docker Compose** | Getting started, small teams, single-server deployments |
|
||||||
|
| **Kubernetes** | Production at scale, teams with existing K8s infrastructure, high availability requirements |
|
||||||
|
|
||||||
|
Most teams start with Docker Compose. It's the fastest path to a working deployment. Move to Kubernetes when you need horizontal scaling or want to integrate with existing orchestration infrastructure.
|
||||||
|
|
||||||
|
## Next steps
|
||||||
|
|
||||||
|
<CardGroup cols={2}>
|
||||||
|
<Card title="Docker Setup" icon="docker" href="/self-hosted/docker">
|
||||||
|
Get Skyvern running in 10 minutes with Docker Compose
|
||||||
|
</Card>
|
||||||
|
<Card title="Kubernetes Deployment" icon="dharmachakra" href="/self-hosted/kubernetes">
|
||||||
|
Deploy to production with Kubernetes manifests
|
||||||
|
</Card>
|
||||||
|
</CardGroup>
|
||||||
236
docs/self-hosted/proxy.mdx
Normal file
236
docs/self-hosted/proxy.mdx
Normal file
@@ -0,0 +1,236 @@
|
|||||||
|
---
|
||||||
|
title: Proxy Setup
|
||||||
|
subtitle: Configure proxies to avoid bot detection
|
||||||
|
slug: self-hosted/proxy
|
||||||
|
---
|
||||||
|
|
||||||
|
Many websites block requests from datacenter IPs or detect automated browser patterns. Skyvern Cloud includes managed residential proxies that handle this automatically. Self-hosted deployments require you to configure your own proxy provider.
|
||||||
|
|
||||||
|
## Why you need proxies
|
||||||
|
|
||||||
|
Without proxies, your browser automation traffic originates from your server's IP address. This causes issues when:
|
||||||
|
|
||||||
|
- **Target sites block datacenter IPs**: Many sites automatically block traffic from known hosting providers (AWS, GCP, Azure)
|
||||||
|
- **Rate limiting**: Repeated requests from one IP trigger rate limits
|
||||||
|
- **Geo-restrictions**: Sites serve different content based on location
|
||||||
|
- **Bot detection**: Some sites fingerprint datacenter traffic patterns
|
||||||
|
|
||||||
|
<Note>
|
||||||
|
If you're automating internal tools or sites that don't have bot detection, you may not need proxies at all. Test without proxies first.
|
||||||
|
</Note>
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Proxy types
|
||||||
|
|
||||||
|
### Residential proxies
|
||||||
|
|
||||||
|
Traffic appears to come from real home internet connections. Most expensive but least likely to be blocked. Recommended for browser automation. Start here unless cost is a primary concern.
|
||||||
|
|
||||||
|
**Providers:**
|
||||||
|
- [Bright Data](https://brightdata.com/)
|
||||||
|
- [Oxylabs](https://oxylabs.io/)
|
||||||
|
- [Smartproxy](https://smartproxy.com/)
|
||||||
|
- [IPRoyal](https://iproyal.com/)
|
||||||
|
|
||||||
|
### ISP proxies
|
||||||
|
|
||||||
|
Static IPs from internet service providers. Good balance between cost and detection avoidance.
|
||||||
|
|
||||||
|
### Datacenter proxies
|
||||||
|
|
||||||
|
IPs from cloud providers. Cheapest but most likely to be blocked.
|
||||||
|
|
||||||
|
### Rotating vs. static
|
||||||
|
|
||||||
|
See [Rotating proxies vs. sticky sessions](#rotating-proxies-vs-sticky-sessions) for guidance on which to use.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
Skyvern supports proxy configuration at the browser level through Playwright.
|
||||||
|
|
||||||
|
### Environment variable approach
|
||||||
|
|
||||||
|
Set proxy configuration in your `.env` file:
|
||||||
|
|
||||||
|
```bash .env
|
||||||
|
ENABLE_PROXY=true
|
||||||
|
|
||||||
|
# Single proxy
|
||||||
|
HOSTED_PROXY_POOL=http://user:pass@proxy.example.com:8080
|
||||||
|
|
||||||
|
# Multiple proxies: Skyvern randomly selects one per browser session
|
||||||
|
HOSTED_PROXY_POOL=http://user:pass@proxy1.example.com:8080,http://user:pass@proxy2.example.com:8080
|
||||||
|
```
|
||||||
|
|
||||||
|
<Note>
|
||||||
|
Skyvern Cloud supports a `proxy_location` parameter on task requests for geographic targeting (e.g., `RESIDENTIAL_US`). This feature is not available in self-hosted deployments. All tasks use the proxy configured in `HOSTED_PROXY_POOL`.
|
||||||
|
</Note>
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Setting up a proxy provider
|
||||||
|
|
||||||
|
### Step 1: Choose a provider
|
||||||
|
|
||||||
|
For browser automation, residential proxies work best. See [proxy types](#proxy-types) above.
|
||||||
|
|
||||||
|
### Step 2: Configure Skyvern
|
||||||
|
|
||||||
|
Add your proxy to the environment:
|
||||||
|
|
||||||
|
```bash .env
|
||||||
|
ENABLE_PROXY=true
|
||||||
|
HOSTED_PROXY_POOL=http://username:password@proxy.provider.com:8080
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 3: Test the connection
|
||||||
|
|
||||||
|
Run a simple task that checks your IP:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl -s http://localhost:8000/v1/tasks \
|
||||||
|
-H "x-api-key: YOUR_API_KEY" \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{
|
||||||
|
"prompt": "What is the IP address shown on this page?",
|
||||||
|
"url": "https://whatismyipaddress.com"
|
||||||
|
}'
|
||||||
|
```
|
||||||
|
|
||||||
|
The task result should show an IP from your proxy provider, not your server's IP.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Proxy authentication methods
|
||||||
|
|
||||||
|
### Basic auth (most common)
|
||||||
|
|
||||||
|
Include credentials in the URL:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
http://username:password@proxy.example.com:8080
|
||||||
|
```
|
||||||
|
|
||||||
|
### IP whitelist
|
||||||
|
|
||||||
|
Some providers allow you to whitelist your server's IP instead of using credentials:
|
||||||
|
|
||||||
|
1. Get your server's public IP: `curl ifconfig.me`
|
||||||
|
2. Add it to your proxy provider's whitelist
|
||||||
|
3. Use the proxy without credentials:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
http://proxy.example.com:8080
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Geographic targeting
|
||||||
|
|
||||||
|
If your proxy provider supports geographic targeting, configure it in your proxy URL. The exact format depends on the provider.
|
||||||
|
|
||||||
|
### Bright Data example
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Target US residential
|
||||||
|
http://user-country-us:pass@proxy.brightdata.com:8080
|
||||||
|
|
||||||
|
# Target specific US state
|
||||||
|
http://user-country-us-state-california:pass@proxy.brightdata.com:8080
|
||||||
|
```
|
||||||
|
|
||||||
|
### Oxylabs example
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Target UK
|
||||||
|
http://user-country-gb:pass@proxy.oxylabs.io:8080
|
||||||
|
```
|
||||||
|
|
||||||
|
Check your provider's documentation for the exact format.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Rotating proxies vs. sticky sessions
|
||||||
|
|
||||||
|
### Rotating (new IP per request)
|
||||||
|
|
||||||
|
Good for:
|
||||||
|
- High-volume scraping
|
||||||
|
- Avoiding per-IP rate limits
|
||||||
|
- Tasks that don't need session persistence
|
||||||
|
|
||||||
|
### Sticky sessions (same IP for duration)
|
||||||
|
|
||||||
|
Good for:
|
||||||
|
- Multi-step automations where the site tracks your session
|
||||||
|
- Login flows
|
||||||
|
- Sites that block IP changes mid-session
|
||||||
|
|
||||||
|
Most providers support sticky sessions via a session ID parameter:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Bright Data sticky session
|
||||||
|
http://user-session-abc123:pass@proxy.brightdata.com:8080
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### "Connection refused" or timeout errors
|
||||||
|
|
||||||
|
- Verify your proxy endpoint and credentials are correct
|
||||||
|
- Check if your server can reach the proxy: `curl -x http://user:pass@proxy:port http://example.com`
|
||||||
|
- Ensure your provider hasn't blocked your IP
|
||||||
|
|
||||||
|
### Target site still blocking requests
|
||||||
|
|
||||||
|
- Try a different proxy location
|
||||||
|
- Use residential instead of datacenter proxies
|
||||||
|
- Enable sticky sessions if the site tracks session changes
|
||||||
|
- Verify the proxy is actually being used (check the IP)
|
||||||
|
|
||||||
|
### Slow performance
|
||||||
|
|
||||||
|
- Proxy overhead adds 100-500ms per request
|
||||||
|
- Choose a proxy location geographically close to the target site
|
||||||
|
- Use datacenter proxies for sites that allow them (faster than residential)
|
||||||
|
|
||||||
|
### High proxy costs
|
||||||
|
|
||||||
|
Residential proxy bandwidth is expensive. To reduce costs:
|
||||||
|
- Disable video recording (reduces bandwidth)
|
||||||
|
- Use datacenter proxies for sites that allow them
|
||||||
|
- Cache resources where possible
|
||||||
|
- Minimize unnecessary page loads
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Running without proxies
|
||||||
|
|
||||||
|
For internal tools or development, proxies aren't always necessary:
|
||||||
|
|
||||||
|
```bash .env
|
||||||
|
ENABLE_PROXY=false
|
||||||
|
```
|
||||||
|
|
||||||
|
Your browser traffic will originate directly from your server's IP. This works well for:
|
||||||
|
- Internal applications
|
||||||
|
- Development and testing
|
||||||
|
- Sites that don't block datacenter traffic
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Next steps
|
||||||
|
|
||||||
|
<CardGroup cols={2}>
|
||||||
|
<Card title="Storage Configuration" icon="hard-drive" href="/self-hosted/storage">
|
||||||
|
Store recordings and artifacts in S3 or Azure Blob
|
||||||
|
</Card>
|
||||||
|
<Card title="Kubernetes Deployment" icon="dharmachakra" href="/self-hosted/kubernetes">
|
||||||
|
Deploy Skyvern at scale with Kubernetes
|
||||||
|
</Card>
|
||||||
|
</CardGroup>
|
||||||
355
docs/self-hosted/storage.mdx
Normal file
355
docs/self-hosted/storage.mdx
Normal file
@@ -0,0 +1,355 @@
|
|||||||
|
---
|
||||||
|
title: Storage Configuration
|
||||||
|
subtitle: Configure where Skyvern stores artifacts and recordings
|
||||||
|
slug: self-hosted/storage
|
||||||
|
---
|
||||||
|
|
||||||
|
Skyvern generates several types of artifacts during task execution: screenshots, browser recordings, HAR files, and extracted data. By default, these are stored on the local filesystem. For production deployments, you can configure S3 or Azure Blob Storage.
|
||||||
|
|
||||||
|
## Storage types
|
||||||
|
|
||||||
|
Skyvern supports three storage backends:
|
||||||
|
|
||||||
|
| Type | `SKYVERN_STORAGE_TYPE` | Best for |
|
||||||
|
|------|------------------------|----------|
|
||||||
|
| Local filesystem | `local` | Development, single-server deployments |
|
||||||
|
| AWS S3 | `s3` | Production on AWS, multi-server deployments |
|
||||||
|
| Azure Blob | `azureblob` | Production on Azure |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Local storage (default)
|
||||||
|
|
||||||
|
By default, Skyvern stores all artifacts in a local directory.
|
||||||
|
|
||||||
|
```bash .env
|
||||||
|
SKYVERN_STORAGE_TYPE=local
|
||||||
|
ARTIFACT_STORAGE_PATH=/data/artifacts
|
||||||
|
VIDEO_PATH=/data/videos
|
||||||
|
HAR_PATH=/data/har
|
||||||
|
LOG_PATH=/data/log
|
||||||
|
```
|
||||||
|
|
||||||
|
### Docker volume mounts
|
||||||
|
|
||||||
|
When using Docker Compose, these paths are mounted from your host:
|
||||||
|
|
||||||
|
```yaml docker-compose.yml
|
||||||
|
volumes:
|
||||||
|
- ./artifacts:/data/artifacts
|
||||||
|
- ./videos:/data/videos
|
||||||
|
- ./har:/data/har
|
||||||
|
- ./log:/data/log
|
||||||
|
```
|
||||||
|
|
||||||
|
### Limitations
|
||||||
|
|
||||||
|
Local storage works well for single-server deployments but has limitations:
|
||||||
|
- Not accessible across multiple servers
|
||||||
|
- No automatic backup or redundancy
|
||||||
|
- Requires manual cleanup to manage disk space
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## AWS S3
|
||||||
|
|
||||||
|
Store artifacts in S3 for durability, scalability, and access from multiple servers.
|
||||||
|
|
||||||
|
### Configuration
|
||||||
|
|
||||||
|
```bash .env
|
||||||
|
SKYVERN_STORAGE_TYPE=s3
|
||||||
|
AWS_REGION=us-east-1
|
||||||
|
AWS_S3_BUCKET_ARTIFACTS=your-skyvern-artifacts
|
||||||
|
AWS_S3_BUCKET_SCREENSHOTS=your-skyvern-screenshots
|
||||||
|
AWS_S3_BUCKET_BROWSER_SESSIONS=your-skyvern-browser-sessions
|
||||||
|
AWS_S3_BUCKET_UPLOADS=your-skyvern-uploads
|
||||||
|
|
||||||
|
# Pre-signed URL expiration (seconds) - default 24 hours
|
||||||
|
PRESIGNED_URL_EXPIRATION=86400
|
||||||
|
|
||||||
|
# Maximum upload file size (bytes) - default 10MB
|
||||||
|
MAX_UPLOAD_FILE_SIZE=10485760
|
||||||
|
```
|
||||||
|
|
||||||
|
### Authentication
|
||||||
|
|
||||||
|
Skyvern uses the standard AWS credential chain. Configure credentials using one of these methods:
|
||||||
|
|
||||||
|
**Environment variables:**
|
||||||
|
|
||||||
|
```bash .env
|
||||||
|
AWS_ACCESS_KEY_ID=AKIA...
|
||||||
|
AWS_SECRET_ACCESS_KEY=...
|
||||||
|
```
|
||||||
|
|
||||||
|
**IAM role (recommended for EC2/ECS/EKS):**
|
||||||
|
|
||||||
|
Attach an IAM role with S3 permissions to your instance or pod. No credentials needed in environment.
|
||||||
|
|
||||||
|
**AWS profile:**
|
||||||
|
|
||||||
|
```bash .env
|
||||||
|
AWS_PROFILE=your-profile-name
|
||||||
|
```
|
||||||
|
|
||||||
|
### Required IAM permissions
|
||||||
|
|
||||||
|
Create an IAM policy with these permissions:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"Version": "2012-10-17",
|
||||||
|
"Statement": [
|
||||||
|
{
|
||||||
|
"Effect": "Allow",
|
||||||
|
"Action": [
|
||||||
|
"s3:GetObject",
|
||||||
|
"s3:PutObject",
|
||||||
|
"s3:DeleteObject",
|
||||||
|
"s3:ListBucket"
|
||||||
|
],
|
||||||
|
"Resource": [
|
||||||
|
"arn:aws:s3:::your-skyvern-artifacts",
|
||||||
|
"arn:aws:s3:::your-skyvern-artifacts/*",
|
||||||
|
"arn:aws:s3:::your-skyvern-screenshots",
|
||||||
|
"arn:aws:s3:::your-skyvern-screenshots/*",
|
||||||
|
"arn:aws:s3:::your-skyvern-browser-sessions",
|
||||||
|
"arn:aws:s3:::your-skyvern-browser-sessions/*",
|
||||||
|
"arn:aws:s3:::your-skyvern-uploads",
|
||||||
|
"arn:aws:s3:::your-skyvern-uploads/*"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Creating the buckets
|
||||||
|
|
||||||
|
Create the S3 buckets in your AWS account:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
aws s3 mb s3://your-skyvern-artifacts --region us-east-1
|
||||||
|
aws s3 mb s3://your-skyvern-screenshots --region us-east-1
|
||||||
|
aws s3 mb s3://your-skyvern-browser-sessions --region us-east-1
|
||||||
|
aws s3 mb s3://your-skyvern-uploads --region us-east-1
|
||||||
|
```
|
||||||
|
|
||||||
|
<Note>
|
||||||
|
Bucket names must be globally unique across all AWS accounts. Add a unique prefix or suffix (e.g., your company name or a random string).
|
||||||
|
</Note>
|
||||||
|
|
||||||
|
### Bucket configuration recommendations
|
||||||
|
|
||||||
|
**Lifecycle rules:** Configure automatic deletion of old artifacts to control costs.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
aws s3api put-bucket-lifecycle-configuration \
|
||||||
|
--bucket your-skyvern-artifacts \
|
||||||
|
--lifecycle-configuration '{
|
||||||
|
"Rules": [
|
||||||
|
{
|
||||||
|
"ID": "DeleteOldArtifacts",
|
||||||
|
"Status": "Enabled",
|
||||||
|
"Filter": {},
|
||||||
|
"Expiration": {
|
||||||
|
"Days": 30
|
||||||
|
}
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}'
|
||||||
|
```
|
||||||
|
|
||||||
|
**Encryption:** Enable server-side encryption for data at rest.
|
||||||
|
|
||||||
|
**Access logging:** Enable access logging for audit trails.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Azure Blob Storage
|
||||||
|
|
||||||
|
Store artifacts in Azure Blob Storage for Azure-based deployments.
|
||||||
|
|
||||||
|
### Configuration
|
||||||
|
|
||||||
|
```bash .env
|
||||||
|
SKYVERN_STORAGE_TYPE=azureblob
|
||||||
|
AZURE_STORAGE_ACCOUNT_NAME=yourstorageaccount
|
||||||
|
AZURE_STORAGE_ACCOUNT_KEY=your-storage-account-key
|
||||||
|
AZURE_STORAGE_CONTAINER_ARTIFACTS=skyvern-artifacts
|
||||||
|
AZURE_STORAGE_CONTAINER_SCREENSHOTS=skyvern-screenshots
|
||||||
|
AZURE_STORAGE_CONTAINER_BROWSER_SESSIONS=skyvern-browser-sessions
|
||||||
|
AZURE_STORAGE_CONTAINER_UPLOADS=skyvern-uploads
|
||||||
|
|
||||||
|
# Pre-signed URL expiration (seconds) - default 24 hours
|
||||||
|
PRESIGNED_URL_EXPIRATION=86400
|
||||||
|
|
||||||
|
# Maximum upload file size (bytes) - default 10MB
|
||||||
|
MAX_UPLOAD_FILE_SIZE=10485760
|
||||||
|
```
|
||||||
|
|
||||||
|
### Creating the storage account and containers
|
||||||
|
|
||||||
|
Using Azure CLI:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Create resource group
|
||||||
|
az group create --name skyvern-rg --location eastus
|
||||||
|
|
||||||
|
# Create storage account
|
||||||
|
az storage account create \
|
||||||
|
--name yourstorageaccount \
|
||||||
|
--resource-group skyvern-rg \
|
||||||
|
--location eastus \
|
||||||
|
--sku Standard_LRS
|
||||||
|
|
||||||
|
# Get the account key
|
||||||
|
az storage account keys list \
|
||||||
|
--account-name yourstorageaccount \
|
||||||
|
--resource-group skyvern-rg \
|
||||||
|
--query '[0].value' -o tsv
|
||||||
|
|
||||||
|
# Create containers
|
||||||
|
az storage container create --name skyvern-artifacts --account-name yourstorageaccount
|
||||||
|
az storage container create --name skyvern-screenshots --account-name yourstorageaccount
|
||||||
|
az storage container create --name skyvern-browser-sessions --account-name yourstorageaccount
|
||||||
|
az storage container create --name skyvern-uploads --account-name yourstorageaccount
|
||||||
|
```
|
||||||
|
|
||||||
|
### Using Managed Identity (recommended)
|
||||||
|
|
||||||
|
For Azure VMs or AKS, use Managed Identity instead of storage account keys:
|
||||||
|
|
||||||
|
1. Enable Managed Identity on your VM or AKS cluster
|
||||||
|
2. Grant the identity "Storage Blob Data Contributor" role on the storage account
|
||||||
|
3. Omit `AZURE_STORAGE_ACCOUNT_KEY` from your configuration
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## What gets stored where
|
||||||
|
|
||||||
|
| Artifact type | S3 bucket / Azure container | Contents |
|
||||||
|
|---------------|---------------------------|----------|
|
||||||
|
| Artifacts | `*-artifacts` | Extracted data, HTML snapshots, logs |
|
||||||
|
| Screenshots | `*-screenshots` | Page screenshots at each step |
|
||||||
|
| Browser Sessions | `*-browser-sessions` | Saved browser state for profiles |
|
||||||
|
| Uploads | `*-uploads` | User-uploaded files for workflows |
|
||||||
|
|
||||||
|
Videos (recordings) are currently always stored locally in `VIDEO_PATH` regardless of storage type.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Pre-signed URLs
|
||||||
|
|
||||||
|
When artifacts are stored in S3 or Azure Blob, Skyvern generates pre-signed URLs for access. These URLs:
|
||||||
|
|
||||||
|
- Expire after `PRESIGNED_URL_EXPIRATION` seconds (default: 24 hours)
|
||||||
|
- Allow direct download without additional authentication
|
||||||
|
- Are included in task responses (`recording_url`, `screenshot_urls`)
|
||||||
|
|
||||||
|
Adjust the expiration based on your needs:
|
||||||
|
|
||||||
|
```bash .env
|
||||||
|
# 1 hour
|
||||||
|
PRESIGNED_URL_EXPIRATION=3600
|
||||||
|
|
||||||
|
# 7 days
|
||||||
|
PRESIGNED_URL_EXPIRATION=604800
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Migrating from local to cloud storage
|
||||||
|
|
||||||
|
To migrate existing artifacts from local storage to S3 or Azure:
|
||||||
|
|
||||||
|
### S3
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Sync local artifacts to S3
|
||||||
|
aws s3 sync ./artifacts s3://your-skyvern-artifacts/
|
||||||
|
|
||||||
|
# Update configuration
|
||||||
|
# SKYVERN_STORAGE_TYPE=s3
|
||||||
|
# ...
|
||||||
|
|
||||||
|
# Restart Skyvern
|
||||||
|
docker compose restart skyvern
|
||||||
|
```
|
||||||
|
|
||||||
|
### Azure
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Install azcopy
|
||||||
|
# https://docs.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-v10
|
||||||
|
|
||||||
|
# Sync local artifacts to Azure
|
||||||
|
azcopy copy './artifacts/*' 'https://yourstorageaccount.blob.core.windows.net/skyvern-artifacts' --recursive
|
||||||
|
|
||||||
|
# Update configuration and restart
|
||||||
|
```
|
||||||
|
|
||||||
|
<Warning>
|
||||||
|
After migration, new artifacts will be stored in cloud storage, but existing local artifacts won't be automatically moved. The sync is a one-time operation.
|
||||||
|
</Warning>
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Disk space management
|
||||||
|
|
||||||
|
### Local storage
|
||||||
|
|
||||||
|
Monitor disk usage and clean up old artifacts periodically:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check disk usage
|
||||||
|
du -sh ./artifacts ./videos ./har ./log
|
||||||
|
|
||||||
|
# Remove artifacts older than 30 days
|
||||||
|
find ./artifacts -type f -mtime +30 -delete
|
||||||
|
find ./videos -type f -mtime +30 -delete
|
||||||
|
```
|
||||||
|
|
||||||
|
### Cloud storage
|
||||||
|
|
||||||
|
Use lifecycle policies to automatically delete old objects:
|
||||||
|
|
||||||
|
**S3:** Configure lifecycle rules to expire objects after N days.
|
||||||
|
|
||||||
|
**Azure:** Configure lifecycle management policies in the Azure portal or via CLI.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### "Access Denied" errors
|
||||||
|
|
||||||
|
- Verify your credentials are correct
|
||||||
|
- Check IAM permissions include all required actions
|
||||||
|
- Ensure the buckets/containers exist
|
||||||
|
- For S3, verify the AWS region matches your bucket location
|
||||||
|
|
||||||
|
### Pre-signed URLs not working
|
||||||
|
|
||||||
|
- Check that `PRESIGNED_URL_EXPIRATION` hasn't elapsed
|
||||||
|
- Verify bucket policy allows public access to pre-signed URLs
|
||||||
|
- For S3, ensure the bucket isn't blocking public access if needed
|
||||||
|
|
||||||
|
### Artifacts not appearing
|
||||||
|
|
||||||
|
- Check Skyvern logs for storage errors: `docker compose logs skyvern | grep -i storage`
|
||||||
|
- Verify the storage type is correctly set: `SKYVERN_STORAGE_TYPE`
|
||||||
|
- Ensure network connectivity to the storage endpoint
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Next steps
|
||||||
|
|
||||||
|
<CardGroup cols={2}>
|
||||||
|
<Card title="Docker Setup" icon="docker" href="/self-hosted/docker">
|
||||||
|
Return to the Docker setup guide
|
||||||
|
</Card>
|
||||||
|
<Card title="Kubernetes Deployment" icon="dharmachakra" href="/self-hosted/kubernetes">
|
||||||
|
Deploy Skyvern at scale
|
||||||
|
</Card>
|
||||||
|
</CardGroup>
|
||||||
Reference in New Issue
Block a user