Files
Dorod-Sky/PARSING-EXAMPLES.md
Vodorod 6b69159550
Some checks failed
Run tests and pre-commit / Run tests and pre-commit hooks (push) Has been cancelled
Run tests and pre-commit / Frontend Lint and Build (push) Has been cancelled
Publish Fern Docs / run (push) Has been cancelled
Update OpenAPI Specification / update-openapi (push) Has been cancelled
feat: Add Russian i18n translations and fix CORS + API endpoint issues
- Implemented full Russian translation (ru) for 8 major pages
- Added LanguageSwitcher component with language detection
- Translated: Navigation, Settings, Workflows, Credentials, Banner, Examples
- Fixed API endpoint path: changed to use sans-api-v1 client for /v1/ endpoints
- Fixed CORS: added http://localhost:8081 to ALLOWED_ORIGINS
- Added locales infrastructure with i18next and react-i18next
- Created bilingual JSON files (en/ru) for 4 namespaces
- 220+ translation keys implemented
- Backend CORS configuration updated in .env
- Documentation: I18N implementation guides and installation docs
2026-02-21 08:29:21 +03:00

11 KiB
Raw Permalink Blame History

Skyvern Parsing Examples

Примеры использования Skyvern для парсинга различных сайтов.

Базовые команды

1. Простое извлечение текста

curl -X POST http://localhost:8000/api/v1/tasks \
  -H "Content-Type: application/json" \
  -H "x-api-key: YOUR_TOKEN" \
  -d '{
    "url": "https://example.com",
    "navigation_goal": "Navigate to the page and extract heading",
    "data_extraction_goal": "Extract the main h1 heading",
    "proxy_location": "NONE"
  }'

2. Извлечение структурированных данных

curl -X POST http://localhost:8000/api/v1/tasks \
  -H "Content-Type: application/json" \
  -H "x-api-key: YOUR_TOKEN" \
  -d '{
    "url": "https://news.ycombinator.com",
    "navigation_goal": "Extract top stories from Hacker News",
    "data_extraction_goal": "Extract titles and URLs of top 5 stories",
    "extracted_information_schema": {
      "type": "object",
      "properties": {
        "stories": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "title": {"type": "string"},
              "url": {"type": "string"},
              "points": {"type": "number"}
            }
          }
        }
      }
    },
    "proxy_location": "NONE",
    "max_steps_per_run": 10
  }'

3. Поиск и клик

curl -X POST http://localhost:8000/api/v1/tasks \
  -H "Content-Type: application/json" \
  -H "x-api-key: YOUR_TOKEN" \
  -d '{
    "url": "https://www.google.com/search?q=skyvern+github",
    "navigation_goal": "Click on the first GitHub result",
    "data_extraction_goal": "Extract the repository name and description",
    "proxy_location": "NONE",
    "max_steps_per_run": 15
  }'

4. Заполнение формы

curl -X POST http://localhost:8000/api/v1/tasks \
  -H "Content-Type: application/json" \
  -H "x-api-key: YOUR_TOKEN" \
  -d '{
    "url": "https://example.com/contact",
    "navigation_goal": "Fill out contact form with name: John Doe, email: john@example.com, message: Hello",
    "data_extraction_goal": "Extract confirmation message after submit",
    "navigation_payload": {
      "name": "John Doe",
      "email": "john@example.com",
      "message": "Hello from Skyvern"
    },
    "proxy_location": "NONE",
    "max_steps_per_run": 20
  }'

Примеры для e-commerce

Парсинг товара

curl -X POST http://localhost:8000/api/v1/tasks \
  -H "Content-Type: application/json" \
  -H "x-api-key: YOUR_TOKEN" \
  -d '{
    "url": "https://www.amazon.com/dp/PRODUCT_ID",
    "navigation_goal": "Extract product information",
    "data_extraction_goal": "Get product name, price, rating, availability",
    "extracted_information_schema": {
      "type": "object",
      "properties": {
        "product_name": {"type": "string"},
        "price": {"type": "string"},
        "rating": {"type": "number"},
        "availability": {"type": "string"},
        "description": {"type": "string"}
      }
    },
    "proxy_location": "NONE"
  }'

Поиск товаров

curl -X POST http://localhost:8000/api/v1/tasks \
  -H "Content-Type: application/json" \
  -H "x-api-key: YOUR_TOKEN" \
  -d '{
    "url": "https://www.ebay.com",
    "navigation_goal": "Search for \"laptop\" and extract first 10 results",
    "data_extraction_goal": "Extract product titles, prices, and seller ratings",
    "navigation_payload": {
      "search_query": "laptop"
    },
    "extracted_information_schema": {
      "type": "object",
      "properties": {
        "products": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "title": {"type": "string"},
              "price": {"type": "string"},
              "seller_rating": {"type": "number"},
              "url": {"type": "string"}
            }
          }
        }
      }
    },
    "proxy_location": "NONE",
    "max_steps_per_run": 25
  }'

Проверка статуса задачи

# Получить статус
curl http://localhost:8000/api/v1/tasks/TASK_ID \
  -H "x-api-key: YOUR_TOKEN" | python3 -m json.tool

# Получить скриншоты (если доступны)
curl http://localhost:8000/api/v1/tasks/TASK_ID/screenshots \
  -H "x-api-key: YOUR_TOKEN"

# Получить логи браузера
curl http://localhost:8000/api/v1/tasks/TASK_ID/browser_logs \
  -H "x-api-key: YOUR_TOKEN"

Python SDK пример

import requests
import json
import time

API_URL = "http://localhost:8000"
API_KEY = "YOUR_TOKEN_HERE"

def create_task(url, navigation_goal, extraction_goal, schema=None):
    """Create a Skyvern task."""
    headers = {
        "Content-Type": "application/json",
        "x-api-key": API_KEY
    }
    
    payload = {
        "url": url,
        "navigation_goal": navigation_goal,
        "data_extraction_goal": extraction_goal,
        "proxy_location": "NONE"
    }
    
    if schema:
        payload["extracted_information_schema"] = schema
    
    response = requests.post(
        f"{API_URL}/api/v1/tasks",
        headers=headers,
        json=payload
    )
    return response.json()

def get_task_status(task_id):
    """Get task status and results."""
    headers = {"x-api-key": API_KEY}
    response = requests.get(
        f"{API_URL}/api/v1/tasks/{task_id}",
        headers=headers
    )
    return response.json()

def wait_for_task(task_id, timeout=300, poll_interval=5):
    """Wait for task to complete."""
    start_time = time.time()
    
    while time.time() - start_time < timeout:
        status = get_task_status(task_id)
        
        if status["status"] == "completed":
            return status
        elif status["status"] == "failed":
            raise Exception(f"Task failed: {status.get('failure_reason')}")
        
        time.sleep(poll_interval)
    
    raise TimeoutError(f"Task did not complete within {timeout} seconds")

# Example usage
if __name__ == "__main__":
    # Create task
    task = create_task(
        url="https://www.python.org",
        navigation_goal="Extract Python version and features",
        extraction_goal="Get latest Python version and key features list",
        schema={
            "type": "object",
            "properties": {
                "version": {"type": "string"},
                "features": {
                    "type": "array",
                    "items": {"type": "string"}
                }
            }
        }
    )
    
    task_id = task["task_id"]
    print(f"Created task: {task_id}")
    
    # Wait for completion
    result = wait_for_task(task_id)
    
    # Print results
    print("\nExtracted Information:")
    print(json.dumps(result["extracted_information"], indent=2))

Node.js пример

const axios = require('axios');

const API_URL = 'http://localhost:8000';
const API_KEY = 'YOUR_TOKEN_HERE';

async function createTask(url, navigationGoal, extractionGoal, schema = null) {
  try {
    const response = await axios.post(
      `${API_URL}/api/v1/tasks`,
      {
        url,
        navigation_goal: navigationGoal,
        data_extraction_goal: extractionGoal,
        proxy_location: 'NONE',
        ...(schema && { extracted_information_schema: schema })
      },
      {
        headers: {
          'Content-Type': 'application/json',
          'x-api-key': API_KEY
        }
      }
    );
    return response.data;
  } catch (error) {
    console.error('Error creating task:', error.response?.data || error.message);
    throw error;
  }
}

async function getTaskStatus(taskId) {
  try {
    const response = await axios.get(
      `${API_URL}/api/v1/tasks/${taskId}`,
      {
        headers: { 'x-api-key': API_KEY }
      }
    );
    return response.data;
  } catch (error) {
    console.error('Error getting task status:', error.response?.data || error.message);
    throw error;
  }
}

async function waitForTask(taskId, timeout = 300000, pollInterval = 5000) {
  const startTime = Date.now();
  
  while (Date.now() - startTime < timeout) {
    const status = await getTaskStatus(taskId);
    
    if (status.status === 'completed') {
      return status;
    } else if (status.status === 'failed') {
      throw new Error(`Task failed: ${status.failure_reason}`);
    }
    
    await new Promise(resolve => setTimeout(resolve, pollInterval));
  }
  
  throw new Error(`Task did not complete within ${timeout}ms`);
}

// Example usage
(async () => {
  try {
    // Create task
    const task = await createTask(
      'https://news.ycombinator.com',
      'Extract top stories',
      'Get titles and URLs of top 5 stories',
      {
        type: 'object',
        properties: {
          stories: {
            type: 'array',
            items: {
              type: 'object',
              properties: {
                title: { type: 'string' },
                url: { type: 'string' }
              }
            }
          }
        }
      }
    );
    
    console.log('Task created:', task.task_id);
    
    // Wait for completion
    const result = await waitForTask(task.task_id);
    
    // Print results
    console.log('\nExtracted Information:');
    console.log(JSON.stringify(result.extracted_information, null, 2));
  } catch (error) {
    console.error('Error:', error.message);
  }
})();

n8n интеграция

Создайте HTTP Request node в n8n:

Settings:

  • Method: POST
  • URL: http://localhost:8000/api/v1/tasks
  • Authentication: Header Auth
    • Name: x-api-key
    • Value: YOUR_TOKEN

Body (JSON):

{
  "url": "{{$json.url}}",
  "navigation_goal": "{{$json.navigation_goal}}",
  "data_extraction_goal": "{{$json.extraction_goal}}",
  "proxy_location": "NONE"
}

Затем добавьте Wait node и еще один HTTP Request для проверки статуса.

Best Practices

  1. Используйте proxy_location: "NONE" для использования системного прокси
  2. Всегда указывайте extracted_information_schema для структурированных данных
  3. Установите max_steps_per_run чтобы ограничить количество шагов
  4. Используйте complete_criterion для сложных сценариев
  5. Добавляйте задержки между запросами при массовом парсинге

Troubleshooting

Task fails with "Country not supported"

Проверьте что proxy_location: "NONE" установлен и HTTP_PROXY настроен в .env.

Task timeout

Увеличьте max_steps_per_run или упростите navigation_goal.

Extraction returns empty data

Улучшите data_extraction_goal - будьте более конкретны о том, что извлекать.

Auth required pages

Используйте totp_verification_url и totp_identifier для 2FA/TOTP.


Автор: GitHub Copilot
Проект: DOROD / Skyvern Integration
Обновлено: 2026-02-20