Add Mintlify documentation setup (#4516)

Co-authored-by: Suchintan <suchintan@users.noreply.github.com>
This commit is contained in:
Naman
2026-01-24 12:04:01 +05:30
committed by GitHub
parent b44c0ac3df
commit eaefc572fc
156 changed files with 5600 additions and 0 deletions

View File

@@ -0,0 +1,406 @@
---
title: Core Concepts
subtitle: The building blocks of Skyvern automations
slug: getting-started/core-concepts
---
Skyvern has eight core concepts. Master these and you can build any browser automation.
---
## Tasks
A **Task** is a single automation job—think of it as asking Skyvern to do one thing. You provide a prompt describing the goal, and Skyvern navigates the browser to complete it.
```python
from skyvern import Skyvern
skyvern = Skyvern(api_key="YOUR_API_KEY")
result = await skyvern.run_task(
prompt="Find the top 3 posts on the front page",
url="https://news.ycombinator.com",
data_extraction_schema={
"type": "object",
"properties": {
"posts": {
"type": "array",
"items": {"type": "string"}
}
}
}
)
print(result.output) # {"posts": ["Post 1", "Post 2", "Post 3"]}
```
**Key properties:**
| Property | Description |
|----------|-------------|
| `prompt` | Natural language description of the goal (required) |
| `url` | Starting URL for the task |
| `data_extraction_schema` | JSON Schema defining expected output structure |
| `max_steps` | Maximum steps allowed (controls cost) |
| `engine` | AI model to use (see [Engines](#engines) below) |
| `browser_session_id` | Run in an existing browser session |
| `webhook_url` | Callback URL for async completion |
**Use Tasks for:** one-off automations, quick data extraction, prototyping before building a workflow.
When you need reusable, multi-step automations, use **Workflows** instead.
---
## Workflows
A **Workflow** is a reusable template composed of blocks. Unlike tasks, workflows can be versioned, shared across your team, and executed repeatedly with different parameters.
```
┌─────────────────────────────────────────────────────────────┐
│ WORKFLOW │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Navigate │ → │ Login │ → │ Extract │ → │ Download │ │
│ │ Block │ │ Block │ │ Block │ │ Block │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
│ │
│ Parameters: {{search_query}}, {{max_price}} │
│ Each block can reference outputs from previous blocks │
└─────────────────────────────────────────────────────────────┘
```
```python
result = await skyvern.run_workflow(
workflow_id="wpid_abc123",
parameters={
"search_query": "wireless headphones",
"max_price": 100
}
)
print(result.output)
```
Workflows support Jinja templating:
- Reference parameters: `{{search_query}}`
- Reference previous block outputs: `{{extract_block.product_name}}`
**Use Workflows for:** repeatable automations, multi-step processes, team-shared templates, complex logic with loops or conditionals.
See [Workflow Blocks](/workflows/workflow-blocks-details) for the full block reference.
---
## Blocks
Blocks are the building units of workflows. Each block performs one specific task, and blocks execute sequentially—each can reference outputs from previous blocks.
### Navigation & Interaction
| Block | Purpose |
|-------|---------|
| **Navigation** | AI-guided navigation toward a goal—Skyvern figures out the clicks and inputs |
| **Action** | Perform a single specific action (click, type, select, upload) |
| **Go to URL** | Navigate directly to a specific URL |
| **Login** | Authenticate using stored credentials |
| **Wait** | Pause execution for a specified duration |
| **Human Interaction** | Pause for manual intervention (CAPTCHA, approval, etc.) |
### Data & Files
| Block | Purpose |
|-------|---------|
| **Extract** | Pull structured data from a page into JSON |
| **File Download** | Download files from websites |
| **File Upload** | Upload files to form fields |
| **File Parser** | Process PDFs, CSVs, and Excel files |
| **PDF Parser** | Specialized PDF text extraction |
### Logic & Control Flow
| Block | Purpose |
|-------|---------|
| **Conditional** | Branch workflow based on conditions (if/else) |
| **For Loop** | Repeat a sequence of blocks over a list |
| **Validation** | Assert conditions; halt workflow on failure |
| **Code** | Execute custom Python/Playwright scripts |
### Communication & Integration
| Block | Purpose |
|-------|---------|
| **HTTP Request** | Make API calls to external services |
| **Text Prompt** | Text-only LLM prompt (no browser interaction) |
| **Send Email** | Send email messages |
For detailed block configuration, see [Workflow Blocks](/workflows/workflow-blocks-details).
---
## Runs
Every time you execute a task or workflow, Skyvern creates a **Run** to track progress and store outputs.
```
┌→ completed
├→ failed
created → queued → running ─┼→ timed_out
├→ terminated
└→ canceled
```
```python
result = await skyvern.run_task(
prompt="Extract the main heading",
url="https://example.com"
)
print(result.run_id) # "tsk_abc123"
print(result.status) # "completed"
print(result.output) # {"heading": "Welcome"}
```
**Run identifiers:**
- Task runs: `tsk_*` prefix
- Workflow runs: `wr_*` prefix
**Run response fields:**
| Field | Description |
|-------|-------------|
| `run_id` | Unique identifier |
| `status` | Current lifecycle state |
| `output` | Extracted data (matches your schema) |
| `recording_url` | Video of the execution |
| `screenshot_urls` | Screenshots captured during execution |
| `downloaded_files` | Files retrieved (with URLs and checksums) |
| `failure_reason` | Error details if failed |
| `step_count` | Number of steps taken |
<Warning>
**Billing:** You're billed per step. A step is one AI decision + action cycle. Use `max_steps` to cap costs during development.
</Warning>
---
## Credentials
Credentials provide secure storage for authentication data. Skyvern encrypts credentials at rest and in transit, and injects them directly into the browser—**credentials are never sent to or seen by the LLM**.
```
┌─────────────────────────────────────────────────────────────┐
│ SECURITY MODEL │
│ │
│ You store credentials Skyvern encrypts & stores │
│ │ │ │
│ ▼ ▼ │
│ ┌─────────────┐ ┌─────────────────┐ │
│ │ Skyvern UI │ │ Encrypted Vault │ │
│ │ or API │ ─────────── │ (or Bitwarden, │ │
│ └─────────────┘ │ 1Password) │ │
│ └────────┬────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ Login Block │ │
│ │ injects into │ │
│ │ browser fields │ │
│ └─────────────────┘ │
│ │ │
│ LLM sees: "login happened" │
│ LLM never sees: actual credentials │
└─────────────────────────────────────────────────────────────┘
```
**Supported credential types:**
- Usernames and passwords
- TOTP codes (authenticator apps)
- Credit cards
**Credential sources:**
- **Skyvern** — Native encrypted storage
- **Bitwarden** — Sync from your Bitwarden vault
- **1Password** — Sync from your 1Password vault
- **Azure Key Vault** — Enterprise secret management
See [Credentials](/credentials/introduction) for setup instructions.
---
## Browser Sessions & Profiles
Skyvern offers two ways to manage browser state across runs:
```
┌─────────────────────────────────────────────────────────────┐
│ │
│ BROWSER SESSION BROWSER PROFILE │
│ (live, temporary) (saved, reusable) │
│ │
│ ┌─────────────┐ ┌─────────────┐ │
│ │ Active │ ──save──▶ │ Snapshot │ │
│ │ Browser │ │ of state │ │
│ │ Instance │ ◀──restore── │ (cookies, │ │
│ └─────────────┘ │ storage) │ │
│ └─────────────┘ │
│ • Expires after timeout • Persists indefinitely │
│ • For real-time chaining • For skipping login │
│ • Max 24 hours • Shared across team │
│ │
└─────────────────────────────────────────────────────────────┘
```
### Browser Sessions
A live browser instance that maintains state across multiple tasks. Use sessions when you need to chain tasks in real-time or allow human intervention.
```python
# Create a session (default: 60 minutes, max: 24 hours)
session = await skyvern.create_browser_session(timeout=120)
# Run tasks in the same session
await skyvern.run_task(
prompt="Log in with the test account",
url="https://example.com/login",
browser_session_id=session.browser_session_id
)
# Second task reuses the authenticated state
await skyvern.run_task(
prompt="Extract the account balance",
url="https://example.com/dashboard",
browser_session_id=session.browser_session_id
)
# Clean up
await skyvern.close_browser_session(
browser_session_id=session.browser_session_id
)
```
**Session limits:** 5 to 1,440 minutes (24 hours max). Default: 60 minutes.
### Browser Profiles
A saved snapshot of browser state. Unlike sessions, profiles persist indefinitely and can be reused across days or weeks—perfect for skipping login on repeated runs.
```python
# Create a profile from a completed run
profile = await skyvern.browser_profiles.create_browser_profile(
name="my-authenticated-profile",
workflow_run_id=run.run_id
)
# Future runs restore the authenticated state
await skyvern.run_workflow(
workflow_id="wpid_extract_data",
browser_profile_id=profile.browser_profile_id
)
```
**What's saved:** Cookies, authentication tokens, local storage, session storage.
See [Browser Sessions](/browser-sessions/introduction) for details.
---
## Artifacts
Every run generates artifacts for observability, debugging, and audit trails.
```python
result = await skyvern.run_task(
prompt="Download the quarterly report",
url="https://example.com"
)
print(result.recording_url) # Full video of execution
print(result.screenshot_urls) # List of screenshot URLs
print(result.downloaded_files) # [{"url": "...", "checksum": "..."}]
```
| Artifact | Description |
|----------|-------------|
| **Recordings** | End-to-end video of the entire run |
| **Screenshots** | Captured after each action |
| **Downloaded files** | Files retrieved during execution |
| **Logs** | JSON-structured logs at step, task, and workflow levels |
| **HAR files** | HTTP Archive data for network debugging |
In the Skyvern UI, go to **Runs** → click a run → view the **Actions** and **Recording** tabs.
---
## Engines
Skyvern supports multiple AI engines for task execution:
| Engine | Description |
|--------|-------------|
| `skyvern-2.0` | Latest Skyvern model (default, recommended) |
| `skyvern-1.0` | Previous Skyvern model |
| `openai-cua` | OpenAI Computer Use Agent |
| `anthropic-cua` | Anthropic Computer Use Agent |
| `ui-tars` | UI-TARS model |
Specify the engine when running a task:
```python
result = await skyvern.run_task(
prompt="Extract pricing data",
url="https://example.com",
engine="skyvern-2.0"
)
```
---
## Quick Reference
| I want to... | Use |
|--------------|-----|
| Run a one-off automation | [Task](#tasks) |
| Build a reusable multi-step process | [Workflow](#workflows) |
| Keep a browser open between tasks | [Browser Session](#browser-sessions) |
| Skip login on repeated runs | [Browser Profile](#browser-profiles) |
| Store secrets securely | [Credentials](#credentials) |
| Debug a failed run | [Artifacts](#artifacts) |
---
## Next Steps
<CardGroup cols={2}>
<Card
title="API Quickstart"
icon="rocket"
href="/getting-started/quickstart"
>
Run your first task in 5 minutes
</Card>
<Card
title="Build Workflows"
icon="diagram-project"
href="/workflows/run-workflows"
>
Create multi-step automations
</Card>
<Card
title="Workflow Blocks"
icon="cube"
href="/workflows/workflow-blocks-details"
>
Full reference for all block types
</Card>
<Card
title="API Reference"
icon="code"
href="/api-reference"
>
Complete API documentation
</Card>
</CardGroup>

View File

@@ -0,0 +1,393 @@
---
title: API Quickstart
slug: getting-started/quickstart
---
Run your first browser automation in 5 minutes. By the end of this guide, you'll scrape the top post from Hacker News using Skyvern's AI agent.
<Note>
Prefer a visual interface? Try the [Cloud UI](/cloud/getting-started) instead — no code required.
</Note>
## Step 1: Get your API key
<img src="/images/get-api-key.png" alt="Get Skyvern API key" />
Sign up at [app.skyvern.com](https://app.skyvern.com) and go to [Settings](https://app.skyvern.com/settings) to copy your API key.
When you make API calls, Skyvern spins up a cloud browser, executes your task with AI, and returns the results. You can watch the browser live at any time.
## Step 2: Install the SDK
<CodeGroup>
```bash Python
pip install skyvern
```
```bash TypeScript
npm install @skyvern/client
```
</CodeGroup>
<Accordion title="Troubleshooting: Python version errors">
The Skyvern SDK requires Python 3.11, 3.12, or 3.13. If you encounter version errors, try using pipx:
```bash
pipx install skyvern
```
pipx installs Python packages in isolated environments while making them globally available.
</Accordion>
## Step 3: Run your first task
Let's scrape the title of the #1 post on Hacker News. You only need two parameters:
- **`prompt`** — Natural language instructions for what the AI should do. Be specific about the data you want extracted.
- **`url`** — The starting page. Skyvern's AI will navigate from here based on your prompt.
The SDK uses async/await because Skyvern spins up a cloud browser and executes your task remotely, which can take 30-60 seconds.
<CodeGroup>
```python Python
import os
import asyncio
from skyvern import Skyvern
async def main():
client = Skyvern(api_key=os.getenv("SKYVERN_API_KEY"))
result = await client.run_task(
prompt="Go to news.ycombinator.com and get the title of the #1 post",
url="https://news.ycombinator.com",
)
print(f"Run ID: {result.run_id}")
print(f"Status: {result.status}")
asyncio.run(main())
```
```typescript TypeScript
import { SkyvernClient } from "@skyvern/client";
async function main() {
const client = new SkyvernClient({
apiKey: process.env.SKYVERN_API_KEY,
});
const result = await client.runTask({
body: {
prompt: "Go to news.ycombinator.com and get the title of the #1 post",
url: "https://news.ycombinator.com",
},
});
console.log(`Run ID: ${result.run_id}`);
console.log(`Status: ${result.status}`);
}
main();
```
```bash cURL
curl -X POST "https://api.skyvern.com/v1/run/tasks" \
-H "x-api-key: $SKYVERN_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"prompt": "Go to news.ycombinator.com and get the title of the #1 post",
"url": "https://news.ycombinator.com"
}'
```
</CodeGroup>
The response includes a `run_id` you'll use to check status and fetch results.
## Step 4: Check the status
Since tasks run asynchronously, you have two options:
1. **Polling** — Periodically check the task status (shown below)
2. **Webhooks** — Get notified when the task completes ([see webhooks guide](/running-tasks/webhooks-faq))
The code below polls every 5 seconds until the task reaches a terminal state. Once complete, `run.output` contains the extracted data as a dictionary.
<CodeGroup>
```python Python
import os
import asyncio
from skyvern import Skyvern
async def main():
client = Skyvern(api_key=os.getenv("SKYVERN_API_KEY"))
result = await client.run_task(
prompt="Go to news.ycombinator.com and get the title of the #1 post",
url="https://news.ycombinator.com",
)
run_id = result.run_id
print(f"Task started: {run_id}")
while True:
run = await client.get_run(run_id)
print(f"Status: {run.status}")
if run.status in ["completed", "failed", "terminated", "timed_out", "canceled"]:
break
await asyncio.sleep(5)
print(f"Final status: {run.status}")
print(f"Output: {run.output}")
asyncio.run(main())
```
```typescript TypeScript
import { SkyvernClient } from "@skyvern/client";
async function main() {
const client = new SkyvernClient({
apiKey: process.env.SKYVERN_API_KEY,
});
const result = await client.runTask({
body: {
prompt: "Go to news.ycombinator.com and get the title of the #1 post",
url: "https://news.ycombinator.com",
},
});
const runId = result.run_id;
console.log(`Task started: ${runId}`);
while (true) {
const run = await client.getRun(runId);
console.log(`Status: ${run.status}`);
if (["completed", "failed", "terminated", "timed_out", "canceled"].includes(run.status)) {
console.log(`Output: ${JSON.stringify(run.output)}`);
break;
}
await new Promise((resolve) => setTimeout(resolve, 5000));
}
}
main();
```
```bash cURL
RUN_ID="your_run_id_here"
while true; do
RESPONSE=$(curl -s -X GET "https://api.skyvern.com/v1/runs/$RUN_ID" \
-H "x-api-key: $SKYVERN_API_KEY")
STATUS=$(echo "$RESPONSE" | jq -r '.status')
echo "Status: $STATUS"
if [[ "$STATUS" == "completed" || "$STATUS" == "failed" || "$STATUS" == "terminated" || "$STATUS" == "timed_out" || "$STATUS" == "canceled" ]]; then
echo "$RESPONSE" | jq '.output'
break
fi
sleep 5
done
```
</CodeGroup>
**Run states:**
- `created` — Task initialized, not yet queued
- `queued` — Waiting for an available browser
- `running` — AI is navigating and executing
- `completed` — Task finished successfully
- `failed` — Task encountered an error
- `terminated` — Task was manually stopped
- `timed_out` — Task exceeded time limit
- `canceled` — Task was cancelled before starting
## Step 5: View your results
When the task completes, you'll get a response like this:
```json
{
"run_id": "tsk_v2_486305187432193504",
"status": "completed",
"output": {
"top_post_title": "Linux kernel framework for PCIe device emulation, in userspace"
},
"downloaded_files": [],
"recording_url": "https://skyvern-artifacts.s3.amazonaws.com/v1/production/.../recording.webm?...",
"screenshot_urls": ["https://skyvern-artifacts.s3.amazonaws.com/v1/production/.../screenshot_final.png?..."],
"app_url": "https://app.skyvern.com/runs/wr_486305187432193510",
"step_count": 2,
"run_type": "task_v2"
}
```
The `output` contains whatever data the AI extracted based on your prompt. The `app_url` links to the Cloud UI where you can view the full run details.
## Step 6: Watch the recording
Every task is recorded. There are two ways to access recordings:
### From the API response
The `recording_url` field is included in every completed run response:
```json
{
"run_id": "tsk_v2_486305187432193504",
"status": "completed",
"recording_url": "https://skyvern-artifacts.s3.amazonaws.com/v1/production/o_485917350850524254/tsk_.../recording.webm?AWSAccessKeyId=...&Signature=...&Expires=..."
}
```
### From the Cloud UI
Navigate to [Runs](https://app.skyvern.com/runs) and click on your run to see the Recording tab.
<img src="/images/view-recording.png" alt="Recording tab in Skyvern Cloud" />
### What you'll see
- **Live browser view** — Watch the AI navigate in real-time
- **Recording** — Full video replay of the session
- **Actions** — Step-by-step breakdown with screenshots
- **AI Reasoning** — See why the AI made each decision
This is invaluable for debugging and understanding how Skyvern interprets your prompts.
---
## Run with a local browser
You can run Skyvern with a browser on your own machine. This is useful for development, debugging, or automating internal tools on your local network.
**Prerequisites:**
- Skyvern SDK installed (`pip install skyvern`)
- PostgreSQL database (local install or Docker)
- An LLM API key (OpenAI, Anthropic, Azure OpenAI, Gemini, Ollama, or any OpenAI-compatible provider)
<Note>
Docker is optional. If you have PostgreSQL installed locally, Skyvern will detect and use it automatically. Use `skyvern init --no-postgres` to skip database setup entirely if you're managing PostgreSQL separately.
</Note>
### Set up local Skyvern
```bash
skyvern init
```
<p align="center">
<img src="/images/skyvern-init.gif" alt="Skyvern init interactive setup wizard" />
</p>
This interactive wizard will:
1. Set up your database (detects local PostgreSQL or uses Docker)
2. Configure your LLM provider
3. Choose browser mode (headless, headful, or connect to existing Chrome)
4. Generate local API credentials
5. Download the Chromium browser
This will generate a .env file that stores your local configuration, LLM api keys and your local `BASE_URL` and `SKYVERN_API_KEY`:
```.env
ENV='local'
ENABLE_OPENAI='true'
OPENAI_API_KEY='<API_KEY>'
...
LLM_KEY='OPENAI_GPT4O'
SECONDARY_LLM_KEY=''
BROWSER_TYPE='chromium-headful'
MAX_SCRAPING_RETRIES='0'
VIDEO_PATH='./videos'
BROWSER_ACTION_TIMEOUT_MS='5000'
MAX_STEPS_PER_RUN='50'
LOG_LEVEL='INFO'
LITELLM_LOG='CRITICAL'
DATABASE_STRING='postgresql+psycopg://skyvern@localhost/skyvern'
PORT='8000'
...
SKYVERN_BASE_URL='http://localhost:8000'
SKYVERN_API_KEY='eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjQ5MTMzODQ2MDksInN1YiI6Im9fNDg0MjIwNjY3MzYzNzA2Njk4In0.Crwy0-y7hpMVSyhzNJGzDu_oaMvrK76RbRb7YhSo3YA'
```
### Start the local server
```bash
skyvern run server
```
<p align="center">
<img src="/images/skyvern-run-server.gif" alt="Skyvern local server logs" />
</p>
### Run a task locally
The only difference from cloud is the `base_url` parameter pointing to your local server. The API is identical, so the same code works in both environments — develop locally, deploy to cloud without changes.
```python
import os
import asyncio
from skyvern import Skyvern
async def main():
client = Skyvern(
base_url="http://localhost:8000",
api_key=os.getenv("SKYVERN_API_KEY")
)
result = await client.run_task(
prompt="Go to news.ycombinator.com and get the title of the #1 post",
url="https://news.ycombinator.com",
)
print(f"Run ID: {result.run_id}")
asyncio.run(main())
```
A browser window will open on your machine (if you chose headful mode). Recordings and logs are saved in the directory where you started the server.
<video style={{ aspectRatio: '16 / 9', width: '100%' }} controls>
<source src="https://github.com/naman06dev/skyvern-docs/raw/8706c85d746e2a6870d81b6a95eaa511ad66dbf8/fern/images/skyvern-agent-local.mp4" type="video/mp4" />
</video>
---
## Next steps
<CardGroup cols={2}>
<Card
title="Extract Structured Data"
icon="database"
href="/running-tasks/run-tasks"
>
Define a schema to get typed JSON output from your automations
</Card>
<Card
title="Handle Logins"
icon="key"
href="/credentials/introduction"
>
Store credentials securely for sites that require authentication
</Card>
<Card
title="Build Workflows"
icon="diagram-project"
href="/workflows/manage-workflows"
>
Chain multiple steps together for complex automations
</Card>
<Card
title="Use Webhooks"
icon="webhook"
href="/running-tasks/webhooks-faq"
>
Get notified when tasks complete instead of polling
</Card>
</CardGroup>