cloud ui docs + cookbooks (#4759)

Co-authored-by: Ritik Sahni <ritiksahni0203@gmail.com>
Co-authored-by: Kunal Mishra <kunalm2345@gmail.com>
This commit is contained in:
Naman
2026-02-17 03:44:40 +05:30
committed by GitHub
parent 06fe51adfa
commit bf8c7de8f9
52 changed files with 4211 additions and 471 deletions

View File

@@ -0,0 +1,80 @@
---
title: Watching Live Execution
subtitle: Monitor, interact with, and control running tasks
slug: cloud/monitor-a-run
---
When you run a task from the [Discover page](/cloud/getting-started/run-a-task), you're taken to the live execution screen where you can watch the browser in real time.
<img src="/images/cloud/live-execution-overview.png" alt="Live execution screen" />
---
## The execution screen
The execution view has three panels:
| Panel | What it shows |
|-------|---------------|
| **Left: Task configuration** | The block being executed, its URL, and prompt. A status badge shows the current state. |
| **Center: Live browser** | Real-time view of the browser. You see pages load, forms fill, and buttons click. |
| **Right: Agent logs** | Real-time LLM reasoning and action decisions. Shows why the AI made each choice. |
---
## When the live view is available
The live browser stream is active while the task is still in progress:
| Status | Live view |
|--------|-----------|
| `created` | Waiting to start |
| `queued` | Waiting for a browser |
| `running` | **Active**: the browser is navigating |
| `paused` | Waiting for human interaction |
| `completed` | Stream closed. View the recording instead. |
| `failed` | Stream closed. View the recording instead. |
| `terminated` | Stream closed. View the recording instead. |
| `timed_out` | Stream closed. View the recording instead. |
| `canceled` | Stream closed. View the recording instead. |
Once a task reaches a final state, the live stream closes. Open the run from **Runs** in the sidebar to access the full recording, screenshots, and action history.
---
## Taking control of the browser
The **Take Control** button lets you interact directly with the browser. This is useful when:
- A CAPTCHA appears that the AI can't solve
- The site has an unusual login flow
- You need to navigate past an unexpected popup
Click **Take Control** to start interacting. Your mouse and keyboard input goes directly to the browser. Click **Stop Controlling** to hand control back to the AI.
<Warning>
Taking control pauses the AI agent. Remember to release control so the agent can resume.
</Warning>
---
## Stopping a running task
You can cancel a task at any time while it's running or queued. Click the **Cancel** button in the task header. A confirmation dialog appears before the task is stopped. The task transitions to `canceled` and any configured webhook fires with the canceled status.
<Note>
Credits for actions already taken are still consumed. Canceling stops future actions but does not refund past ones.
</Note>
---
## Reviewing results
Once a task finishes, open it from **Runs** to see the full results. The run detail page has five tabs:
- **Overview**: The AI's reasoning timeline alongside browser screenshots. Each Thought, Block, and Action card shows what the agent saw and why it acted.
- **Output**: The complete JSON output and any downloaded files.
- **Parameters**: The configuration you submitted: URL, prompt, engine, proxy location, webhook URL, data schema, and other settings.
- **Recording**: Full video replay of the browser session. Every task is recorded automatically.
- **Code**: Auto-generated Python code to reproduce this task via the API or SDK (when code generation is enabled).
For a full walkthrough of each tab, see [Run Details](/cloud/viewing-results/run-details).

View File

@@ -0,0 +1,81 @@
---
title: UI Overview
slug: cloud/overview
subtitle: Navigate the Skyvern Cloud dashboard
---
Skyvern Cloud ([app.skyvern.com](https://app.skyvern.com)) lets you automate any website from your browser. Describe what you want in plain English, watch an AI-powered browser do it live, and get structured results back. No code required.
<Note>
Looking to integrate Skyvern into your own app? See the [API Quickstart](/getting-started/quickstart) instead.
</Note>
## The dashboard
Sign in and you'll land on the **Discover** page, the starting point for running automations.
<img src="/images/cloud/skyvern-cloud-discover.png" alt="Skyvern Cloud dashboard showing the Discover page" />
The left sidebar is your navigation hub. Here's what each section does:
### Build
Where you create and monitor automations.
| Page | Purpose |
|------|---------|
| **Discover** | Run one-off tasks. Type your instructions and target URL into a single prompt, pick an engine, and hit send. |
| **Workflows** | Build multi-step automations with the visual workflow editor. Add loops, conditionals, and data passing between steps. |
| **Runs** | Execution history for every task and workflow. Filter by status, drill into any run to see actions, recordings, and extracted data. |
| **Browsers** | Active browser sessions. Useful for persistent sessions that keep login state across tasks. |
### Agents
Ready-made automation templates. Each agent is preconfigured with a prompt, target URL, and settings. Pick one to see it work or use it as a starting point for your own task.
### General
| Page | Purpose |
|------|---------|
| **Billing** | Usage, remaining credits, and plan management. |
| **Credentials** | Store website logins securely. Skyvern uses these to authenticate automatically when it encounters a login page. |
| **Settings** | API key, account preferences, and organization management. |
## How it works
Every automation in Skyvern Cloud follows the same pattern:
<Steps>
<Step title="Describe your task">
Type what you want into the prompt bar. Include the target URL and your instructions in one go. Something like "Get the top post from https://news.ycombinator.com" or "Fill out the contact form at https://example.com/contact with my details."
</Step>
<Step title="Watch it happen">
A cloud browser opens and you see it navigate in real time. Pages load, elements highlight, actions fire. An agent log streams the AI's reasoning — what it sees on the page, what it plans to do, and why — so you can follow along. If the AI gets stuck, hit **Take Control** to jump in and help.
</Step>
<Step title="Get your results">
Extracted data appears as structured JSON on the run detail page. Every run also includes an output view, full recording, the parameters you submitted, and auto-generated code to reproduce the task via API.
</Step>
</Steps>
That's it. The next guide walks you through this flow with a real example.
---
## Next steps
<CardGroup cols={2}>
<Card
title="Run Your First Task"
icon="play"
href="/cloud/getting-started/run-your-first-task"
>
Follow along with a real example to see Skyvern Cloud in action
</Card>
<Card
title="Core Concepts"
icon="book"
href="/getting-started/core-concepts"
>
Understand tasks, workflows, and other foundational concepts
</Card>
</CardGroup>

View File

@@ -0,0 +1,185 @@
---
title: The Discover Page
subtitle: Run ad-hoc browser automations with natural language
slug: cloud/run-a-task
---
The **Discover** page is where you run one-off browser automations. Type what you want done in plain language, and Skyvern opens a browser and does it for you.
<img src="/images/cloud/discover-page-overview.png" alt="Discover page overview" />
---
## The prompt box
Type a natural language instruction describing what you want automated. Be specific about the goal and any data you want extracted.
**Examples:**
- "Go to amazon.com and find the price of the MacBook Air M4"
- "Fill out the contact form at example.com/contact with name John Doe and email john@example.com"
- "Get an insurance quote from geico.com for a 2020 Toyota Camry"
<img src="/images/cloud/prompt-box-filled.png" alt="Prompt box with a sample prompt" />
Click the **send button** or press **Enter** to start.
Below the prompt box, **quick-action buttons** offer pre-built examples like "Add a product to cart" or "Get an insurance quote." Click one to run it immediately or use it as a starting point.
---
## Choosing an engine
The dropdown next to the send button controls which engine runs the task.
| Engine | Best for |
|--------|----------|
| **Skyvern 2.0 with Code** | Complex, multi-step tasks. Generates reusable scripts. **(Default)** |
| **Skyvern 2.0** | Complex tasks without script generation |
| **Skyvern 1.0** | Simple, single-objective tasks. Faster and cheaper. |
<Tip>
Start with the default. Switch to Skyvern 1.0 when you have a straightforward, single-page task and want faster execution.
</Tip>
---
## Advanced settings
Click the **gear icon** next to the prompt box to expand the settings panel.
<img src="/images/cloud/advanced-settings-panel.png" alt="Advanced settings panel" />
| Setting | What it does |
|---------|-------------|
| **Proxy Location** | Route the browser through a residential proxy in a specific country. Default is `RESIDENTIAL` (US). Set to `NONE` to disable. Available: US, UK, Germany, France, Spain, Ireland, India, Japan, Australia, Canada, Brazil, Mexico, Argentina, New Zealand, South Africa, Italy, Netherlands, Philippines, Turkey. |
| **Webhook URL** | URL that receives a POST request when the task finishes. The payload includes status, extracted data, screenshots, and recording URL. |
| **Browser Session ID** | Run inside an existing persistent browser session (`pbs_xxx`). Preserves cookies and login state across multiple tasks. |
| **CDP Address** | Connect to your own browser via Chrome DevTools Protocol (e.g., `http://127.0.0.1:9222`). For local development. |
| **2FA Identifier** | Links your TOTP credentials to this task. Skyvern uses it to retrieve the correct code when a 2FA prompt appears. |
| **Extra HTTP Headers** | Custom headers sent with every browser request, as JSON (e.g., `{"Authorization": "Bearer token"}`). |
| **Publish Workflow** | Save a reusable workflow alongside the task run. Re-run the same automation later from the Workflows page. |
| **Max Steps Override** | Cap the number of AI reasoning steps. Each step = one screenshot-analyze-act cycle. Useful for controlling cost during development. |
| **Max Screenshot Scrolls** | Number of scrolls for post-action screenshots. Increase for pages with lazy-loaded content. `0` = viewport only. |
---
## Data extraction schema
The **Data Schema** field in advanced settings lets you define the structure of extracted output as [JSON Schema](https://json-schema.org/).
Without a schema, the AI returns data in whatever format it chooses. With a schema, output conforms to your structure, making it predictable for downstream use.
<img src="/images/cloud/data-schema-field.png" alt="Data schema field with JSON" />
```json
{
"type": "object",
"properties": {
"product_name": {
"type": "string",
"description": "The name of the product"
},
"price": {
"type": "number",
"description": "The price in USD"
},
"in_stock": {
"type": "boolean",
"description": "Whether the product is in stock"
}
}
}
```
Use the `description` field on each property to guide the AI on what to extract.
<Accordion title="Example: Extracting a list of items">
```json
{
"type": "object",
"properties": {
"quotes": {
"type": "array",
"items": {
"type": "object",
"properties": {
"premium_amount": {
"type": "string",
"description": "Total premium in USD (e.g., '$321.57')"
},
"coverage_type": {
"type": "string",
"description": "Type of coverage (e.g., 'Full Coverage')"
},
"deductible": {
"type": "string",
"description": "Deductible amount"
}
}
}
}
}
}
```
</Accordion>
---
## Workflow templates
Below the prompt box, the Discover page shows a gallery of **workflow templates**: pre-built automations for common use cases.
<img src="/images/cloud/workflow-templates.png" alt="Workflow template gallery" />
Click any template to launch it with pre-filled configuration, or use it as a starting point and customize.
---
## Tips for better results
**Write specific prompts.** Include the exact goal, target fields, and what "done" looks like.
| Instead of | Write |
|-----------|-------|
| "Get some data from this site" | "Extract the product name, price, and availability from the first 5 results on amazon.com/s?k=wireless+mouse" |
| "Fill out the form" | "Fill the contact form at example.com/contact with name 'Jane Doe', email 'jane@example.com', and message 'Demo request'" |
**Control cost with Max Steps.** Set **Max Steps Override** to a reasonable limit (e.g., 1020 for simple tasks) during development. Each step consumes one credit. Remove the cap once you've confirmed the task works.
**Debug failures in order.** If a task fails or produces wrong results:
1. Check the **Failure Reason** at the top of the run detail page
2. Read the **Thought cards** in the Overview timeline to find where the AI went off track
3. Watch the **Recording** to see what actually happened on screen
4. Review **Parameters** to confirm the inputs were correct
---
## What happens next
1. Your prompt is sent to Skyvern
2. A cloud browser opens and navigates to the target URL (or finds one from your prompt)
3. The AI analyzes the page, plans actions, and executes them step by step
4. You're taken to the [live execution view](/cloud/getting-started/monitor-a-run) where you can watch it happen in real time
5. When complete, results appear on the run detail page under **Runs**
---
## Next steps
<CardGroup cols={2}>
<Card
title="Watching Live Execution"
icon="eye"
href="/cloud/getting-started/monitor-a-run"
>
Monitor runs, take control of the browser, and review results
</Card>
<Card
title="Build a Workflow"
icon="diagram-project"
href="/cloud/building-workflows/build-a-workflow"
>
Turn a successful task into a reusable multi-step workflow
</Card>
</CardGroup>

View File

@@ -0,0 +1,133 @@
---
title: Your First Task
slug: cloud/run-your-first-task
subtitle: Run a browser automation from start to finish
---
Let's run a real automation. You'll tell Skyvern to visit a website, extract data, and return it as JSON. Then watch the entire thing happen live.
## Step 1: Write your prompt
Open [app.skyvern.com](https://app.skyvern.com) and you'll land on the **Discover** page.
<img src="/images/cloud/skyvern-cloud-discover.png" alt="Discover page with a prompt entered" />
The Discover page has a single input field. Type your instructions and include the target URL in the same prompt. For this example, enter:
```
Get the title of the #1 post on the front page for https://news.ycombinator.com
```
That's it. Skyvern parses the URL and figures out how to navigate the page and extract the data.
Below the input, you'll see quick-action chips like "Add a product to cart" and "What's the top post on hackernews". Click any of these to try a pre-filled example instead.
<Tip>
The more specific your prompt, the better. "Get the title of the #1 post" works much better than "get some data." Include the exact fields you want, what success looks like, and any constraints.
</Tip>
## Step 2: Pick an engine and run
Next to your prompt, you'll see an engine selector. Click it to switch engines:
| Engine | When to use it |
|--------|---------------|
| **Skyvern 1.0** | Tasks with a simple, single goal: filling a form, searching for information on Google, reading content from a page |
| **Skyvern 2.0** | Complex, multi-step tasks. Scores state-of-the-art 85.85% on the WebVoyager benchmark |
| **Skyvern 2.0 with Code** | The default engine. Same capabilities as Skyvern 2.0, plus auto-generates reusable code and a workflow from the run |
For this example, keep the default **Skyvern 2.0 with Code** selected.
Click the **send button** (arrow icon to the right of the input). Skyvern generates a workflow from your prompt and opens it in the workflow editor. Click **Run** in the top right, confirm the parameters, then click **Run workflow** to start execution.
<Accordion title="Optional: Advanced settings">
Click the **gear icon** next to send to configure additional options before running:
| Setting | What it does |
|---------|-------------|
| **Webhook Callback URL** | Endpoint to receive the extracted data when the run completes |
| **Proxy Location** | Route Skyvern through one of the available proxies |
| **Browser Session ID** | Reuse a persistent browser session to keep login state |
| **CDP Address** | Connect to your own browser via Chrome DevTools Protocol |
| **2FA Identifier** | Identifier for a 2FA code to handle two-factor auth automatically |
| **Extra HTTP Headers** | Custom HTTP request headers (dict format) |
| **Generate Script** | Auto-generate reusable scripts from a successful run |
| **Publish Workflow** | Create a workflow alongside this task run |
| **Max Steps Override** | Cap the number of steps the AI can take |
| **Data Schema** | Define structured JSON output format |
| **Max Screenshot Scrolls** | Limit scrolls for post-action screenshots (default: 3) |
These are all optional. The defaults work for most tasks.
</Accordion>
## Step 3: Watch the live browser
This is where it gets interesting. Once the task starts, you'll see the run detail page with a live view of the browser:
<img src="/images/cloud/discover-prompt-in-process.png" alt="Run detail page showing a live browser navigating Hacker News" />
On the left, a **live browser view**. You'll see pages load, elements highlight, and actions fire.
On the right, the **agent log**. A running stream of the AI's Thoughts, Decisions, and block executions. If something goes wrong, this is where you'll figure out why.
## Step 4: Review the results
When the task finishes, the status badge flips to **completed** and the extracted data appears at the top of the page.
<img src="/images/cloud/discover-workflow-completed.png" alt="Completed run showing extracted data and result tabs" />
### Extracted data
The **Extracted Information** block shows your results as structured JSON:
```json
[
{
"top_post_title": "Don't rent the cloud, own instead"
}
]
```
Your result will differ — the #1 post changes constantly. The structure is what matters.
The agent log on the right confirms what happened. You'll see a final Thought summarizing the result.
### Tabs
Below the extracted data, five tabs give you different views of the run:
- **Overview**: The AI's reasoning timeline alongside browser screenshots. Each Thought, Block, and Action card shows what the agent saw and why it acted.
- **Output**: The complete JSON output and any downloaded files.
- **Parameters**: The exact configuration that was submitted (URL, prompt, engine, schema). Useful for reproducing or tweaking the run.
- **Recording**: Full video replay of the browser session, start to finish.
- **Code**: Auto-generated Python code to reproduce this task via the API or SDK.
## Try something bigger
Now that you've seen the basic flow, here are a few ideas to try next:
- **Fill a form**: Point Skyvern at a contact form and tell it what to enter in each field
- **Compare prices**: Extract product names and prices from an e-commerce page using a data schema
- **Navigate a flow**: Use Skyvern 2.0 to walk through a multi-page checkout or signup process
- **Use an Agent template**: Check the **Agents** section in the sidebar for pre-built automations you can run instantly
---
## Next steps
<CardGroup cols={2}>
<Card
title="Run a Task via API"
icon="code"
href="/running-automations/run-a-task"
>
Trigger automations programmatically with the Skyvern API
</Card>
<Card
title="Core Concepts"
icon="book"
href="/getting-started/core-concepts"
>
Understand tasks, workflows, and other building blocks
</Card>
</CardGroup>