Files
Dorod-Sky/PARSING-EXAMPLES.md
Vodorod 6b69159550
Some checks failed
Run tests and pre-commit / Run tests and pre-commit hooks (push) Has been cancelled
Run tests and pre-commit / Frontend Lint and Build (push) Has been cancelled
Publish Fern Docs / run (push) Has been cancelled
Update OpenAPI Specification / update-openapi (push) Has been cancelled
feat: Add Russian i18n translations and fix CORS + API endpoint issues
- Implemented full Russian translation (ru) for 8 major pages
- Added LanguageSwitcher component with language detection
- Translated: Navigation, Settings, Workflows, Credentials, Banner, Examples
- Fixed API endpoint path: changed to use sans-api-v1 client for /v1/ endpoints
- Fixed CORS: added http://localhost:8081 to ALLOWED_ORIGINS
- Added locales infrastructure with i18next and react-i18next
- Created bilingual JSON files (en/ru) for 4 namespaces
- 220+ translation keys implemented
- Backend CORS configuration updated in .env
- Documentation: I18N implementation guides and installation docs
2026-02-21 08:29:21 +03:00

412 lines
11 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Skyvern Parsing Examples
Примеры использования Skyvern для парсинга различных сайтов.
## Базовые команды
### 1. Простое извлечение текста
```bash
curl -X POST http://localhost:8000/api/v1/tasks \
-H "Content-Type: application/json" \
-H "x-api-key: YOUR_TOKEN" \
-d '{
"url": "https://example.com",
"navigation_goal": "Navigate to the page and extract heading",
"data_extraction_goal": "Extract the main h1 heading",
"proxy_location": "NONE"
}'
```
### 2. Извлечение структурированных данных
```bash
curl -X POST http://localhost:8000/api/v1/tasks \
-H "Content-Type: application/json" \
-H "x-api-key: YOUR_TOKEN" \
-d '{
"url": "https://news.ycombinator.com",
"navigation_goal": "Extract top stories from Hacker News",
"data_extraction_goal": "Extract titles and URLs of top 5 stories",
"extracted_information_schema": {
"type": "object",
"properties": {
"stories": {
"type": "array",
"items": {
"type": "object",
"properties": {
"title": {"type": "string"},
"url": {"type": "string"},
"points": {"type": "number"}
}
}
}
}
},
"proxy_location": "NONE",
"max_steps_per_run": 10
}'
```
### 3. Поиск и клик
```bash
curl -X POST http://localhost:8000/api/v1/tasks \
-H "Content-Type: application/json" \
-H "x-api-key: YOUR_TOKEN" \
-d '{
"url": "https://www.google.com/search?q=skyvern+github",
"navigation_goal": "Click on the first GitHub result",
"data_extraction_goal": "Extract the repository name and description",
"proxy_location": "NONE",
"max_steps_per_run": 15
}'
```
### 4. Заполнение формы
```bash
curl -X POST http://localhost:8000/api/v1/tasks \
-H "Content-Type: application/json" \
-H "x-api-key: YOUR_TOKEN" \
-d '{
"url": "https://example.com/contact",
"navigation_goal": "Fill out contact form with name: John Doe, email: john@example.com, message: Hello",
"data_extraction_goal": "Extract confirmation message after submit",
"navigation_payload": {
"name": "John Doe",
"email": "john@example.com",
"message": "Hello from Skyvern"
},
"proxy_location": "NONE",
"max_steps_per_run": 20
}'
```
## Примеры для e-commerce
### Парсинг товара
```bash
curl -X POST http://localhost:8000/api/v1/tasks \
-H "Content-Type: application/json" \
-H "x-api-key: YOUR_TOKEN" \
-d '{
"url": "https://www.amazon.com/dp/PRODUCT_ID",
"navigation_goal": "Extract product information",
"data_extraction_goal": "Get product name, price, rating, availability",
"extracted_information_schema": {
"type": "object",
"properties": {
"product_name": {"type": "string"},
"price": {"type": "string"},
"rating": {"type": "number"},
"availability": {"type": "string"},
"description": {"type": "string"}
}
},
"proxy_location": "NONE"
}'
```
### Поиск товаров
```bash
curl -X POST http://localhost:8000/api/v1/tasks \
-H "Content-Type: application/json" \
-H "x-api-key: YOUR_TOKEN" \
-d '{
"url": "https://www.ebay.com",
"navigation_goal": "Search for \"laptop\" and extract first 10 results",
"data_extraction_goal": "Extract product titles, prices, and seller ratings",
"navigation_payload": {
"search_query": "laptop"
},
"extracted_information_schema": {
"type": "object",
"properties": {
"products": {
"type": "array",
"items": {
"type": "object",
"properties": {
"title": {"type": "string"},
"price": {"type": "string"},
"seller_rating": {"type": "number"},
"url": {"type": "string"}
}
}
}
}
},
"proxy_location": "NONE",
"max_steps_per_run": 25
}'
```
## Проверка статуса задачи
```bash
# Получить статус
curl http://localhost:8000/api/v1/tasks/TASK_ID \
-H "x-api-key: YOUR_TOKEN" | python3 -m json.tool
# Получить скриншоты (если доступны)
curl http://localhost:8000/api/v1/tasks/TASK_ID/screenshots \
-H "x-api-key: YOUR_TOKEN"
# Получить логи браузера
curl http://localhost:8000/api/v1/tasks/TASK_ID/browser_logs \
-H "x-api-key: YOUR_TOKEN"
```
## Python SDK пример
```python
import requests
import json
import time
API_URL = "http://localhost:8000"
API_KEY = "YOUR_TOKEN_HERE"
def create_task(url, navigation_goal, extraction_goal, schema=None):
"""Create a Skyvern task."""
headers = {
"Content-Type": "application/json",
"x-api-key": API_KEY
}
payload = {
"url": url,
"navigation_goal": navigation_goal,
"data_extraction_goal": extraction_goal,
"proxy_location": "NONE"
}
if schema:
payload["extracted_information_schema"] = schema
response = requests.post(
f"{API_URL}/api/v1/tasks",
headers=headers,
json=payload
)
return response.json()
def get_task_status(task_id):
"""Get task status and results."""
headers = {"x-api-key": API_KEY}
response = requests.get(
f"{API_URL}/api/v1/tasks/{task_id}",
headers=headers
)
return response.json()
def wait_for_task(task_id, timeout=300, poll_interval=5):
"""Wait for task to complete."""
start_time = time.time()
while time.time() - start_time < timeout:
status = get_task_status(task_id)
if status["status"] == "completed":
return status
elif status["status"] == "failed":
raise Exception(f"Task failed: {status.get('failure_reason')}")
time.sleep(poll_interval)
raise TimeoutError(f"Task did not complete within {timeout} seconds")
# Example usage
if __name__ == "__main__":
# Create task
task = create_task(
url="https://www.python.org",
navigation_goal="Extract Python version and features",
extraction_goal="Get latest Python version and key features list",
schema={
"type": "object",
"properties": {
"version": {"type": "string"},
"features": {
"type": "array",
"items": {"type": "string"}
}
}
}
)
task_id = task["task_id"]
print(f"Created task: {task_id}")
# Wait for completion
result = wait_for_task(task_id)
# Print results
print("\nExtracted Information:")
print(json.dumps(result["extracted_information"], indent=2))
```
## Node.js пример
```javascript
const axios = require('axios');
const API_URL = 'http://localhost:8000';
const API_KEY = 'YOUR_TOKEN_HERE';
async function createTask(url, navigationGoal, extractionGoal, schema = null) {
try {
const response = await axios.post(
`${API_URL}/api/v1/tasks`,
{
url,
navigation_goal: navigationGoal,
data_extraction_goal: extractionGoal,
proxy_location: 'NONE',
...(schema && { extracted_information_schema: schema })
},
{
headers: {
'Content-Type': 'application/json',
'x-api-key': API_KEY
}
}
);
return response.data;
} catch (error) {
console.error('Error creating task:', error.response?.data || error.message);
throw error;
}
}
async function getTaskStatus(taskId) {
try {
const response = await axios.get(
`${API_URL}/api/v1/tasks/${taskId}`,
{
headers: { 'x-api-key': API_KEY }
}
);
return response.data;
} catch (error) {
console.error('Error getting task status:', error.response?.data || error.message);
throw error;
}
}
async function waitForTask(taskId, timeout = 300000, pollInterval = 5000) {
const startTime = Date.now();
while (Date.now() - startTime < timeout) {
const status = await getTaskStatus(taskId);
if (status.status === 'completed') {
return status;
} else if (status.status === 'failed') {
throw new Error(`Task failed: ${status.failure_reason}`);
}
await new Promise(resolve => setTimeout(resolve, pollInterval));
}
throw new Error(`Task did not complete within ${timeout}ms`);
}
// Example usage
(async () => {
try {
// Create task
const task = await createTask(
'https://news.ycombinator.com',
'Extract top stories',
'Get titles and URLs of top 5 stories',
{
type: 'object',
properties: {
stories: {
type: 'array',
items: {
type: 'object',
properties: {
title: { type: 'string' },
url: { type: 'string' }
}
}
}
}
}
);
console.log('Task created:', task.task_id);
// Wait for completion
const result = await waitForTask(task.task_id);
// Print results
console.log('\nExtracted Information:');
console.log(JSON.stringify(result.extracted_information, null, 2));
} catch (error) {
console.error('Error:', error.message);
}
})();
```
## n8n интеграция
Создайте HTTP Request node в n8n:
**Settings:**
- Method: `POST`
- URL: `http://localhost:8000/api/v1/tasks`
- Authentication: `Header Auth`
- Name: `x-api-key`
- Value: `YOUR_TOKEN`
**Body (JSON):**
```json
{
"url": "{{$json.url}}",
"navigation_goal": "{{$json.navigation_goal}}",
"data_extraction_goal": "{{$json.extraction_goal}}",
"proxy_location": "NONE"
}
```
Затем добавьте Wait node и еще один HTTP Request для проверки статуса.
## Best Practices
1. **Используйте `proxy_location: "NONE"`** для использования системного прокси
2. **Всегда указывайте `extracted_information_schema`** для структурированных данных
3. **Установите `max_steps_per_run`** чтобы ограничить количество шагов
4. **Используйте `complete_criterion`** для сложных сценариев
5. **Добавляйте задержки** между запросами при массовом парсинге
## Troubleshooting
### Task fails with "Country not supported"
Проверьте что `proxy_location: "NONE"` установлен и `HTTP_PROXY` настроен в `.env`.
### Task timeout
Увеличьте `max_steps_per_run` или упростите `navigation_goal`.
### Extraction returns empty data
Улучшите `data_extraction_goal` - будьте более конкретны о том, что извлекать.
### Auth required pages
Используйте `totp_verification_url` и `totp_identifier` для 2FA/TOTP.
---
**Автор**: GitHub Copilot
**Проект**: DOROD / Skyvern Integration
**Обновлено**: 2026-02-20