feat: new self-hosting docs (#4689)

Co-authored-by: Ritik Sahni <ritiksahni0203@gmail.com>
2026-02-11 09:09:42 +05:30
parent 45cfbb2e75
commit 263a6dec0c
9 changed files with 2139 additions and 6 deletions
--- a/docs/self-hosted/llm-configuration.mdx
+++ b/docs/self-hosted/llm-configuration.mdx
@@ -0,0 +1,336 @@
+---
+title: LLM Configuration
+subtitle: Connect your preferred language model provider
+slug: self-hosted/llm-configuration
+---
+
+Skyvern uses LLMs to analyze screenshots and decide what actions to take. You'll need to configure at least one LLM provider before running tasks.
+
+## How Skyvern uses LLMs
+
+Skyvern makes multiple LLM calls per task step:
+1. **Screenshot analysis**: Identify interactive elements on the page
+2. **Action planning**: Decide what to click, type, or extract
+3. **Result extraction**: Parse data from the page into structured output
+
+A task that runs for 10 steps makes roughly 30+ LLM calls. Choose your provider and model tier with this in mind.
+
+For most deployments, configure a single provider using `LLM_KEY`. Skyvern also supports a `SECONDARY_LLM_KEY` for lighter tasks to reduce costs.
+
+---
+
+## OpenAI
+
+The most common choice. Requires an API key from [platform.openai.com](https://platform.openai.com/).
+
+```bash .env
+ENABLE_OPENAI=true
+OPENAI_API_KEY=sk-...
+LLM_KEY=OPENAI_GPT4O
+```
+
+### Available models
+
+| LLM_KEY | Model | Notes |
+|---------|-------|-------|
+| `OPENAI_GPT4O` | gpt-4o | Recommended for most use cases |
+| `OPENAI_GPT4O_MINI` | gpt-4o-mini | Cheaper, less capable |
+| `OPENAI_GPT4_1` | gpt-4.1 | Latest GPT-4 family |
+| `OPENAI_GPT4_1_MINI` | gpt-4.1-mini | Cheaper GPT-4.1 variant |
+| `OPENAI_O3` | o3 | Reasoning model |
+| `OPENAI_O3_MINI` | o3-mini | Cheaper reasoning model |
+| `OPENAI_GPT4_TURBO` | gpt-4-turbo | Previous generation |
+| `OPENAI_GPT4V` | gpt-4-turbo | Legacy alias for gpt-4-turbo |
+
+### Optional settings
+
+```bash .env
+# Use a custom API endpoint (for proxies or compatible services)
+OPENAI_API_BASE=https://your-proxy.com/v1
+
+# Specify organization ID
+OPENAI_ORGANIZATION=org-...
+```
+
+---
+
+## Anthropic
+
+Claude models from [anthropic.com](https://www.anthropic.com/).
+
+```bash .env
+ENABLE_ANTHROPIC=true
+ANTHROPIC_API_KEY=sk-ant-...
+LLM_KEY=ANTHROPIC_CLAUDE3.5_SONNET
+```
+
+### Available models
+
+| LLM_KEY | Model | Notes |
+|---------|-------|-------|
+| `ANTHROPIC_CLAUDE4.5_SONNET` | claude-4.5-sonnet | Latest Sonnet |
+| `ANTHROPIC_CLAUDE4.5_OPUS` | claude-4.5-opus | Most capable |
+| `ANTHROPIC_CLAUDE4_SONNET` | claude-4-sonnet | Claude 4 |
+| `ANTHROPIC_CLAUDE4_OPUS` | claude-4-opus | Claude 4 Opus |
+| `ANTHROPIC_CLAUDE3.7_SONNET` | claude-3-7-sonnet | Previous generation |
+| `ANTHROPIC_CLAUDE3.5_SONNET` | claude-3-5-sonnet | Previous generation |
+| `ANTHROPIC_CLAUDE3.5_HAIKU` | claude-3-5-haiku | Cheap and fast |
+
+---
+
+## Azure OpenAI
+
+Microsoft-hosted OpenAI models. Requires an Azure subscription with OpenAI service provisioned.
+
+```bash .env
+ENABLE_AZURE=true
+LLM_KEY=AZURE_OPENAI
+AZURE_DEPLOYMENT=your-deployment-name
+AZURE_API_KEY=your-azure-api-key
+AZURE_API_BASE=https://your-resource.openai.azure.com/
+AZURE_API_VERSION=2024-08-01-preview
+```
+
+### Setup steps
+
+1. Create an Azure OpenAI resource in the [Azure Portal](https://portal.azure.com)
+2. Open the Azure AI Foundry portal from your resource's overview page
+3. Go to **Shared Resources** → **Deployments**
+4. Click **Deploy Model** → **Deploy Base Model** → select GPT-4o or GPT-4
+5. Note the **Deployment Name**. Use this for `AZURE_DEPLOYMENT`
+6. Copy your API key and endpoint from the Azure Portal
+
+<Note>
+The `AZURE_DEPLOYMENT` is the name you chose when deploying the model, not the model name itself.
+</Note>
+
+---
+
+## Google Gemini
+
+Gemini models through [Vertex AI](https://cloud.google.com/vertex-ai). Requires a GCP project with Vertex AI enabled.
+
+```bash .env
+ENABLE_VERTEX_AI=true
+LLM_KEY=VERTEX_GEMINI_3.0_FLASH
+GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json
+GCP_PROJECT_ID=your-gcp-project-id
+GCP_REGION=us-central1
+```
+
+### Setup steps
+
+1. Create a [GCP project](https://console.cloud.google.com/) with billing enabled
+2. Enable the **Vertex AI API** in your project
+3. Create a service account with the **Vertex AI User** role
+4. Download the service account JSON key file
+5. Set `GOOGLE_APPLICATION_CREDENTIALS` to the path of that file
+
+### Available models
+
+| LLM_KEY | Model | Notes |
+|---------|-------|-------|
+| `VERTEX_GEMINI_3.0_FLASH` | gemini-3-flash-preview | Recommended |
+| `VERTEX_GEMINI_2.5_PRO` | gemini-2.5-pro | Stable |
+| `VERTEX_GEMINI_2.5_FLASH` | gemini-2.5-flash | Cheaper, faster |
+
+---
+
+## Amazon Bedrock
+
+Run Anthropic Claude through your AWS account.
+
+```bash .env
+ENABLE_BEDROCK=true
+LLM_KEY=BEDROCK_ANTHROPIC_CLAUDE3.5_SONNET
+AWS_REGION=us-west-2
+AWS_ACCESS_KEY_ID=AKIA...
+AWS_SECRET_ACCESS_KEY=...
+```
+
+### Setup steps
+
+1. Create an IAM user with `AmazonBedrockFullAccess` policy
+2. Generate access keys for the IAM user
+3. In the [Bedrock console](https://console.aws.amazon.com/bedrock/), go to **Model Access**
+4. Enable access to Claude 3.5 Sonnet
+
+### Available models
+
+| LLM_KEY | Model |
+|---------|-------|
+| `BEDROCK_ANTHROPIC_CLAUDE3.5_SONNET` | Claude 3.5 Sonnet v2 |
+| `BEDROCK_ANTHROPIC_CLAUDE3.5_SONNET_V1` | Claude 3.5 Sonnet v1 |
+| `BEDROCK_ANTHROPIC_CLAUDE3.7_SONNET_INFERENCE_PROFILE` | Claude 3.7 Sonnet (cross-region) |
+| `BEDROCK_ANTHROPIC_CLAUDE4_SONNET_INFERENCE_PROFILE` | Claude 4 Sonnet (cross-region) |
+| `BEDROCK_ANTHROPIC_CLAUDE4.5_SONNET_INFERENCE_PROFILE` | Claude 4.5 Sonnet (cross-region) |
+
+<Note>
+Bedrock inference profile keys (`*_INFERENCE_PROFILE`) use cross-region inference and require `AWS_REGION` only. No access keys needed if running on an IAM-authenticated instance.
+</Note>
+
+---
+
+## Ollama (Local Models)
+
+Run open-source models locally with [Ollama](https://ollama.ai/). No API costs, but requires sufficient local compute.
+
+```bash .env
+ENABLE_OLLAMA=true
+LLM_KEY=OLLAMA
+OLLAMA_MODEL=llama3.1
+OLLAMA_SERVER_URL=http://host.docker.internal:11434
+OLLAMA_SUPPORTS_VISION=false
+```
+
+### Setup steps
+
+1. [Install Ollama](https://ollama.ai/download)
+2. Pull a model: `ollama pull llama3.1`
+3. Start Ollama: `ollama serve`
+4. Configure Skyvern to connect
+
+<Warning>
+Most Ollama models don't support vision. Set `OLLAMA_SUPPORTS_VISION=false`. Without vision, Skyvern relies on DOM analysis instead of screenshot analysis, which may reduce accuracy on complex pages.
+</Warning>
+
+### Docker networking
+
+When running Skyvern in Docker and Ollama on the host:
+
+| Host OS | OLLAMA_SERVER_URL |
+|---------|-------------------|
+| macOS/Windows | `http://host.docker.internal:11434` |
+| Linux | `http://172.17.0.1:11434` (Docker bridge IP) |
+
+---
+
+## OpenAI-Compatible Endpoints
+
+Connect to any service that implements the OpenAI API format, including LiteLLM, LocalAI, vLLM, and text-generation-inference.
+
+```bash .env
+ENABLE_OPENAI_COMPATIBLE=true
+OPENAI_COMPATIBLE_MODEL_NAME=llama3.1
+OPENAI_COMPATIBLE_API_KEY=sk-test
+OPENAI_COMPATIBLE_API_BASE=http://localhost:4000/v1
+LLM_KEY=OPENAI_COMPATIBLE
+```
+
+This is useful for:
+- Running local models with a unified API
+- Using LiteLLM as a proxy to switch between providers
+- Connecting to self-hosted inference servers
+
+---
+
+## OpenRouter
+
+Access multiple models through a single API at [openrouter.ai](https://openrouter.ai/).
+
+```bash .env
+ENABLE_OPENROUTER=true
+LLM_KEY=OPENROUTER
+OPENROUTER_API_KEY=sk-or-...
+OPENROUTER_MODEL=mistralai/mistral-small-3.1-24b-instruct
+```
+
+---
+
+## Groq
+
+Inference on open-source models at [groq.com](https://groq.com/).
+
+```bash .env
+ENABLE_GROQ=true
+LLM_KEY=GROQ
+GROQ_API_KEY=gsk_...
+GROQ_MODEL=llama-3.1-8b-instant
+```
+
+<Note>
+Groq specializes in fast inference for open-source models. Response times are typically much faster than other providers, but model selection is limited.
+</Note>
+
+---
+
+## Using multiple models
+
+### Primary and secondary models
+
+Configure a cheaper model for lightweight operations:
+
+```bash .env
+# Main model for complex decisions
+LLM_KEY=OPENAI_GPT4O
+
+# Cheaper model for simple tasks like dropdown selection
+SECONDARY_LLM_KEY=OPENAI_GPT4O_MINI
+```
+
+### Task-specific models
+
+For fine-grained control, you can override models for specific operations:
+
+```bash .env
+# Model for data extraction from pages (defaults to LLM_KEY if not set)
+EXTRACTION_LLM_KEY=ANTHROPIC_CLAUDE3.5_SONNET
+
+# Model for generating code/scripts in code blocks (defaults to LLM_KEY if not set)
+SCRIPT_GENERATION_LLM_KEY=OPENAI_GPT4O
+```
+
+Most deployments don't need task-specific models. Start with `LLM_KEY` and `SECONDARY_LLM_KEY`.
+
+---
+
+## Troubleshooting
+
+### "To enable svg shape conversion, please set the Secondary LLM key"
+
+Some operations require a secondary model. Set `SECONDARY_LLM_KEY` in your environment:
+
+```bash .env
+SECONDARY_LLM_KEY=OPENAI_GPT4O_MINI
+```
+
+### "Context window exceeded"
+
+The page content is too large for the model's context window. Options:
+- Use a model with a larger context (GPT-4o supports 128k tokens)
+- Simplify your prompt to require less page analysis
+- Start from a more specific URL with less content
+
+### "LLM caller not found"
+
+The configured `LLM_KEY` doesn't match any enabled provider. Verify:
+1. The provider is enabled (`ENABLE_OPENAI=true`, etc.)
+2. The `LLM_KEY` value matches a supported model name exactly
+3. Model names are case-sensitive: `OPENAI_GPT4O` not `openai_gpt4o`
+
+### Container logs show authentication errors
+
+Check your API key configuration:
+- Ensure the key is set correctly without extra whitespace
+- Verify the key hasn't expired or been revoked
+- For Azure, ensure `AZURE_API_BASE` includes the full URL with `https://`
+
+### Slow response times
+
+LLM calls typically take 2-10 seconds. Longer times may indicate:
+- Network latency to the provider
+- Rate limiting (the provider may be throttling requests)
+- For Ollama, insufficient local compute resources
+
+---
+
+## Next steps
+
+<CardGroup cols={2}>
+  <Card title="Browser Configuration" icon="window" href="/self-hosted/browser">
+    Configure browser modes, locales, and display settings
+  </Card>
+  <Card title="Docker Setup" icon="docker" href="/self-hosted/docker">
+    Return to the main Docker setup guide
+  </Card>
+</CardGroup>