Some checks failed
Run tests and pre-commit / Run tests and pre-commit hooks (push) Has been cancelled
Run tests and pre-commit / Frontend Lint and Build (push) Has been cancelled
Publish Fern Docs / run (push) Has been cancelled
Update OpenAPI Specification / update-openapi (push) Has been cancelled
- Implemented full Russian translation (ru) for 8 major pages - Added LanguageSwitcher component with language detection - Translated: Navigation, Settings, Workflows, Credentials, Banner, Examples - Fixed API endpoint path: changed to use sans-api-v1 client for /v1/ endpoints - Fixed CORS: added http://localhost:8081 to ALLOWED_ORIGINS - Added locales infrastructure with i18next and react-i18next - Created bilingual JSON files (en/ru) for 4 namespaces - 220+ translation keys implemented - Backend CORS configuration updated in .env - Documentation: I18N implementation guides and installation docs
11 KiB
11 KiB
Skyvern Parsing Examples
Примеры использования Skyvern для парсинга различных сайтов.
Базовые команды
1. Простое извлечение текста
curl -X POST http://localhost:8000/api/v1/tasks \
-H "Content-Type: application/json" \
-H "x-api-key: YOUR_TOKEN" \
-d '{
"url": "https://example.com",
"navigation_goal": "Navigate to the page and extract heading",
"data_extraction_goal": "Extract the main h1 heading",
"proxy_location": "NONE"
}'
2. Извлечение структурированных данных
curl -X POST http://localhost:8000/api/v1/tasks \
-H "Content-Type: application/json" \
-H "x-api-key: YOUR_TOKEN" \
-d '{
"url": "https://news.ycombinator.com",
"navigation_goal": "Extract top stories from Hacker News",
"data_extraction_goal": "Extract titles and URLs of top 5 stories",
"extracted_information_schema": {
"type": "object",
"properties": {
"stories": {
"type": "array",
"items": {
"type": "object",
"properties": {
"title": {"type": "string"},
"url": {"type": "string"},
"points": {"type": "number"}
}
}
}
}
},
"proxy_location": "NONE",
"max_steps_per_run": 10
}'
3. Поиск и клик
curl -X POST http://localhost:8000/api/v1/tasks \
-H "Content-Type: application/json" \
-H "x-api-key: YOUR_TOKEN" \
-d '{
"url": "https://www.google.com/search?q=skyvern+github",
"navigation_goal": "Click on the first GitHub result",
"data_extraction_goal": "Extract the repository name and description",
"proxy_location": "NONE",
"max_steps_per_run": 15
}'
4. Заполнение формы
curl -X POST http://localhost:8000/api/v1/tasks \
-H "Content-Type: application/json" \
-H "x-api-key: YOUR_TOKEN" \
-d '{
"url": "https://example.com/contact",
"navigation_goal": "Fill out contact form with name: John Doe, email: john@example.com, message: Hello",
"data_extraction_goal": "Extract confirmation message after submit",
"navigation_payload": {
"name": "John Doe",
"email": "john@example.com",
"message": "Hello from Skyvern"
},
"proxy_location": "NONE",
"max_steps_per_run": 20
}'
Примеры для e-commerce
Парсинг товара
curl -X POST http://localhost:8000/api/v1/tasks \
-H "Content-Type: application/json" \
-H "x-api-key: YOUR_TOKEN" \
-d '{
"url": "https://www.amazon.com/dp/PRODUCT_ID",
"navigation_goal": "Extract product information",
"data_extraction_goal": "Get product name, price, rating, availability",
"extracted_information_schema": {
"type": "object",
"properties": {
"product_name": {"type": "string"},
"price": {"type": "string"},
"rating": {"type": "number"},
"availability": {"type": "string"},
"description": {"type": "string"}
}
},
"proxy_location": "NONE"
}'
Поиск товаров
curl -X POST http://localhost:8000/api/v1/tasks \
-H "Content-Type: application/json" \
-H "x-api-key: YOUR_TOKEN" \
-d '{
"url": "https://www.ebay.com",
"navigation_goal": "Search for \"laptop\" and extract first 10 results",
"data_extraction_goal": "Extract product titles, prices, and seller ratings",
"navigation_payload": {
"search_query": "laptop"
},
"extracted_information_schema": {
"type": "object",
"properties": {
"products": {
"type": "array",
"items": {
"type": "object",
"properties": {
"title": {"type": "string"},
"price": {"type": "string"},
"seller_rating": {"type": "number"},
"url": {"type": "string"}
}
}
}
}
},
"proxy_location": "NONE",
"max_steps_per_run": 25
}'
Проверка статуса задачи
# Получить статус
curl http://localhost:8000/api/v1/tasks/TASK_ID \
-H "x-api-key: YOUR_TOKEN" | python3 -m json.tool
# Получить скриншоты (если доступны)
curl http://localhost:8000/api/v1/tasks/TASK_ID/screenshots \
-H "x-api-key: YOUR_TOKEN"
# Получить логи браузера
curl http://localhost:8000/api/v1/tasks/TASK_ID/browser_logs \
-H "x-api-key: YOUR_TOKEN"
Python SDK пример
import requests
import json
import time
API_URL = "http://localhost:8000"
API_KEY = "YOUR_TOKEN_HERE"
def create_task(url, navigation_goal, extraction_goal, schema=None):
"""Create a Skyvern task."""
headers = {
"Content-Type": "application/json",
"x-api-key": API_KEY
}
payload = {
"url": url,
"navigation_goal": navigation_goal,
"data_extraction_goal": extraction_goal,
"proxy_location": "NONE"
}
if schema:
payload["extracted_information_schema"] = schema
response = requests.post(
f"{API_URL}/api/v1/tasks",
headers=headers,
json=payload
)
return response.json()
def get_task_status(task_id):
"""Get task status and results."""
headers = {"x-api-key": API_KEY}
response = requests.get(
f"{API_URL}/api/v1/tasks/{task_id}",
headers=headers
)
return response.json()
def wait_for_task(task_id, timeout=300, poll_interval=5):
"""Wait for task to complete."""
start_time = time.time()
while time.time() - start_time < timeout:
status = get_task_status(task_id)
if status["status"] == "completed":
return status
elif status["status"] == "failed":
raise Exception(f"Task failed: {status.get('failure_reason')}")
time.sleep(poll_interval)
raise TimeoutError(f"Task did not complete within {timeout} seconds")
# Example usage
if __name__ == "__main__":
# Create task
task = create_task(
url="https://www.python.org",
navigation_goal="Extract Python version and features",
extraction_goal="Get latest Python version and key features list",
schema={
"type": "object",
"properties": {
"version": {"type": "string"},
"features": {
"type": "array",
"items": {"type": "string"}
}
}
}
)
task_id = task["task_id"]
print(f"Created task: {task_id}")
# Wait for completion
result = wait_for_task(task_id)
# Print results
print("\nExtracted Information:")
print(json.dumps(result["extracted_information"], indent=2))
Node.js пример
const axios = require('axios');
const API_URL = 'http://localhost:8000';
const API_KEY = 'YOUR_TOKEN_HERE';
async function createTask(url, navigationGoal, extractionGoal, schema = null) {
try {
const response = await axios.post(
`${API_URL}/api/v1/tasks`,
{
url,
navigation_goal: navigationGoal,
data_extraction_goal: extractionGoal,
proxy_location: 'NONE',
...(schema && { extracted_information_schema: schema })
},
{
headers: {
'Content-Type': 'application/json',
'x-api-key': API_KEY
}
}
);
return response.data;
} catch (error) {
console.error('Error creating task:', error.response?.data || error.message);
throw error;
}
}
async function getTaskStatus(taskId) {
try {
const response = await axios.get(
`${API_URL}/api/v1/tasks/${taskId}`,
{
headers: { 'x-api-key': API_KEY }
}
);
return response.data;
} catch (error) {
console.error('Error getting task status:', error.response?.data || error.message);
throw error;
}
}
async function waitForTask(taskId, timeout = 300000, pollInterval = 5000) {
const startTime = Date.now();
while (Date.now() - startTime < timeout) {
const status = await getTaskStatus(taskId);
if (status.status === 'completed') {
return status;
} else if (status.status === 'failed') {
throw new Error(`Task failed: ${status.failure_reason}`);
}
await new Promise(resolve => setTimeout(resolve, pollInterval));
}
throw new Error(`Task did not complete within ${timeout}ms`);
}
// Example usage
(async () => {
try {
// Create task
const task = await createTask(
'https://news.ycombinator.com',
'Extract top stories',
'Get titles and URLs of top 5 stories',
{
type: 'object',
properties: {
stories: {
type: 'array',
items: {
type: 'object',
properties: {
title: { type: 'string' },
url: { type: 'string' }
}
}
}
}
}
);
console.log('Task created:', task.task_id);
// Wait for completion
const result = await waitForTask(task.task_id);
// Print results
console.log('\nExtracted Information:');
console.log(JSON.stringify(result.extracted_information, null, 2));
} catch (error) {
console.error('Error:', error.message);
}
})();
n8n интеграция
Создайте HTTP Request node в n8n:
Settings:
- Method:
POST - URL:
http://localhost:8000/api/v1/tasks - Authentication:
Header Auth- Name:
x-api-key - Value:
YOUR_TOKEN
- Name:
Body (JSON):
{
"url": "{{$json.url}}",
"navigation_goal": "{{$json.navigation_goal}}",
"data_extraction_goal": "{{$json.extraction_goal}}",
"proxy_location": "NONE"
}
Затем добавьте Wait node и еще один HTTP Request для проверки статуса.
Best Practices
- Используйте
proxy_location: "NONE"для использования системного прокси - Всегда указывайте
extracted_information_schemaдля структурированных данных - Установите
max_steps_per_runчтобы ограничить количество шагов - Используйте
complete_criterionдля сложных сценариев - Добавляйте задержки между запросами при массовом парсинге
Troubleshooting
Task fails with "Country not supported"
Проверьте что proxy_location: "NONE" установлен и HTTP_PROXY настроен в .env.
Task timeout
Увеличьте max_steps_per_run или упростите navigation_goal.
Extraction returns empty data
Улучшите data_extraction_goal - будьте более конкретны о том, что извлекать.
Auth required pages
Используйте totp_verification_url и totp_identifier для 2FA/TOTP.
Автор: GitHub Copilot
Проект: DOROD / Skyvern Integration
Обновлено: 2026-02-20