By default, Skyvern returns extracted data in whatever format makes sense for the task.
Pass a `data_extraction_schema` to enforce a specific structure using [JSON Schema.](https://json-schema.org/)
---
## Define a schema
Add `data_extraction_schema` parameter to your task with a JSON Schema object:
<CodeGroup>
```python Python
result = await client.run_task(
prompt="Get the title of the top post",
url="https://news.ycombinator.com",
data_extraction_schema={
"type": "object",
"properties": {
"title": {
"type": "string",
"description": "The title of the top post"
}
}
}
)
```
```typescript TypeScript
const result = await client.runTask({
body: {
prompt: "Get the title of the top post",
url: "https://news.ycombinator.com",
data_extraction_schema: {
type: "object",
properties: {
title: {
type: "string",
description: "The title of the top post",
},
},
},
},
});
```
```bash cURL
curl -X POST "https://api.skyvern.com/v1/run/tasks" \
-H "x-api-key: $SKYVERN_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"prompt": "Get the title of the top post",
"url": "https://news.ycombinator.com",
"data_extraction_schema": {
"type": "object",
"properties": {
"title": {
"type": "string",
"description": "The title of the top post"
}
}
}
}'
```
</CodeGroup>
The `description` field in each property helps Skyvern understand what data to extract. Be specific.
<Warning>
`description` fields drive extraction quality. Vague descriptions like "the data" produce vague results. Be specific: "The product price in USD, without currency symbol."
Arrays without limits extract everything visible on the page. Specify limits in your prompt (e.g., "top 5 posts") or the array description to control output size.
</Tip>
---
### Nested objects
Extract hierarchical data, such as a product with its pricing and availability:
<CodeGroup>
```python Python
result = await client.run_task(
prompt="Get product details including pricing and availability",
url="https://www.amazon.com/dp/B0EXAMPLE",
data_extraction_schema={
"type": "object",
"properties": {
"product": {
"type": "object",
"description": "Product information",
"properties": {
"name": {
"type": "string",
"description": "Product name"
},
"pricing": {
"type": "object",
"description": "Pricing details",
"properties": {
"current_price": {
"type": "number",
"description": "Current price in USD"
},
"original_price": {
"type": "number",
"description": "Original price before discount"
},
"discount_percent": {
"type": "integer",
"description": "Discount percentage"
}
}
},
"availability": {
"type": "object",
"description": "Stock information",
"properties": {
"in_stock": {
"type": "boolean",
"description": "Whether the item is in stock"
},
"delivery_estimate": {
"type": "string",
"description": "Estimated delivery date"
}
}
}
}
}
}
}
)
```
```typescript TypeScript
const result = await client.runTask({
body: {
prompt: "Get product details including pricing and availability",
url: "https://www.amazon.com/dp/B0EXAMPLE",
data_extraction_schema: {
type: "object",
properties: {
product: {
type: "object",
description: "Product information",
properties: {
name: {
type: "string",
description: "Product name",
},
pricing: {
type: "object",
description: "Pricing details",
properties: {
current_price: {
type: "number",
description: "Current price in USD",
},
original_price: {
type: "number",
description: "Original price before discount",
},
discount_percent: {
type: "integer",
description: "Discount percentage",
},
},
},
availability: {
type: "object",
description: "Stock information",
properties: {
in_stock: {
type: "boolean",
description: "Whether the item is in stock",
},
delivery_estimate: {
type: "string",
description: "Estimated delivery date",
},
},
},
},
},
},
},
},
});
```
```bash cURL
curl -X POST "https://api.skyvern.com/v1/run/tasks" \
-H "x-api-key: $SKYVERN_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"prompt": "Get product details including pricing and availability",
"url": "https://www.amazon.com/dp/B0EXAMPLE",
"data_extraction_schema": {
"type": "object",
"properties": {
"product": {
"type": "object",
"description": "Product information",
"properties": {
"name": {
"type": "string",
"description": "Product name"
},
"pricing": {
"type": "object",
"description": "Pricing details",
"properties": {
"current_price": {
"type": "number",
"description": "Current price in USD"
},
"original_price": {
"type": "number",
"description": "Original price before discount"
},
"discount_percent": {
"type": "integer",
"description": "Discount percentage"
}
}
},
"availability": {
"type": "object",
"description": "Stock information",
"properties": {
"in_stock": {
"type": "boolean",
"description": "Whether the item is in stock"
},
"delivery_estimate": {
"type": "string",
"description": "Estimated delivery date"
}
}
}
}
}
}
}
}'
```
</CodeGroup>
**Output (when completed):**
```json
{
"product": {
"name": "Wireless Bluetooth Headphones",
"pricing": {
"current_price": 79.99,
"original_price": 129.99,
"discount_percent": 38
},
"availability": {
"in_stock": true,
"delivery_estimate": "Tomorrow, Jan 21"
}
}
}
```
---
## Accessing extracted data
The extracted data appears in the `output` field of the completed run. Poll until the task reaches a terminal state, then access the output.