---
title: Extract Structured Data
subtitle: Get consistent, typed output from your tasks
slug: running-automations/extract-structured-data
---
export const FIELD_TYPES = [
{ value: "string", label: "String" },
{ value: "number", label: "Number" },
{ value: "integer", label: "Integer" },
{ value: "boolean", label: "Boolean" },
]
export const SchemaBuilder = () => {
const [schemaType, setSchemaType] = useState("single")
const [arrayName, setArrayName] = useState("items")
const [fields, setFields] = useState([
{ id: "1", name: "title", type: "string", description: "The title" },
])
const [outputFormat, setOutputFormat] = useState("python")
const [copied, setCopied] = useState(false)
const addField = () => {
setFields([...fields, { id: String(Date.now()), name: "", type: "string", description: "" }])
}
const removeField = (id) => {
if (fields.length > 1) setFields(fields.filter((f) => f.id !== id))
}
const updateField = (id, key, value) => {
setFields(fields.map((f) => (f.id === id ? { ...f, [key]: value } : f)))
}
const duplicateNames = useMemo(() => {
const names = fields.map((f) => f.name).filter((n) => n.trim() !== "")
const counts = {}
for (const n of names) {
counts[n] = (counts[n] || 0) + 1
}
return new Set(Object.keys(counts).filter((n) => counts[n] > 1))
}, [fields])
const schema = useMemo(() => {
const properties = {}
fields.forEach((field) => {
if (field.name) {
properties[field.name] = {
type: field.type,
description: field.description || `The ${field.name}`,
}
}
})
if (schemaType === "array") {
return {
type: "object",
properties: {
[arrayName]: {
type: "array",
description: "List of extracted items",
items: { type: "object", properties },
},
},
}
}
return { type: "object", properties }
}, [fields, schemaType, arrayName])
const formattedOutput = useMemo(() => {
const jsonStr = JSON.stringify(schema, null, 2)
if (outputFormat === "python") {
return `data_extraction_schema=${jsonStr.replace(/: null/g, ": None").replace(/: true/g, ": True").replace(/: false/g, ": False")}`
}
if (outputFormat === "typescript") {
return `data_extraction_schema: ${jsonStr}`
}
return `"data_extraction_schema": ${jsonStr}`
}, [schema, outputFormat])
const copyToClipboard = async () => {
await navigator.clipboard.writeText(formattedOutput)
setCopied(true)
setTimeout(() => setCopied(false), 2000)
}
return (
{[
{ value: "single", label: "Single object", desc: "Extract one item with multiple fields" },
{ value: "array", label: "List of items", desc: "Extract multiple items with the same structure" },
].map((type) => (
))}
{schemaType === "array" && (
setArrayName(e.target.value)}
className="w-full p-2 border rounded-md text-sm"
placeholder="items"
/>
)}
{duplicateNames.size > 0 && (
Duplicate field names detected. Only the last field with each name will appear in the schema.
)}
{["python", "typescript", "curl"].map((format) => (
))}
{formattedOutput}
)
}
By default, Skyvern returns extracted data in whatever format makes sense for the task.
Pass a `data_extraction_schema` to enforce a specific structure using [JSON Schema.](https://json-schema.org/)
---
## Define a schema
Add `data_extraction_schema` parameter to your task with a JSON Schema object:
```python Python
result = await client.run_task(
prompt="Get the title of the top post",
url="https://news.ycombinator.com",
data_extraction_schema={
"type": "object",
"properties": {
"title": {
"type": "string",
"description": "The title of the top post"
}
}
}
)
```
```typescript TypeScript
const result = await client.runTask({
body: {
prompt: "Get the title of the top post",
url: "https://news.ycombinator.com",
data_extraction_schema: {
type: "object",
properties: {
title: {
type: "string",
description: "The title of the top post",
},
},
},
},
});
```
```bash cURL
curl -X POST "https://api.skyvern.com/v1/run/tasks" \
-H "x-api-key: $SKYVERN_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"prompt": "Get the title of the top post",
"url": "https://news.ycombinator.com",
"data_extraction_schema": {
"type": "object",
"properties": {
"title": {
"type": "string",
"description": "The title of the top post"
}
}
}
}'
```
The `description` field in each property helps Skyvern understand what data to extract. Be specific.
`description` fields drive extraction quality. Vague descriptions like "the data" produce vague results. Be specific: "The product price in USD, without currency symbol."
---
## Schema format
Skyvern uses standard JSON Schema. Common types:
| Type | JSON Schema | Example value |
|------|-------------|---------------|
| String | `{"type": "string"}` | `"Hello world"` |
| Number | `{"type": "number"}` | `19.99` |
| Integer | `{"type": "integer"}` | `42` |
| Boolean | `{"type": "boolean"}` | `true` |
| Array | `{"type": "array", "items": {...}}` | `[1, 2, 3]` |
| Object | `{"type": "object", "properties": {...}}` | `{"key": "value"}` |
A schema doesn't guarantee all fields are populated. If the data isn't on the page, fields return `null`. Design your code to handle missing values.
---
## Build your schema
Use the interactive builder to generate a schema, then copy it into your code.
---
## Examples
### Single value
Extract one piece of information, such as the current price of Bitcoin:
```python Python
result = await client.run_task(
prompt="Get the current Bitcoin price in USD",
url="https://coinmarketcap.com/currencies/bitcoin/",
data_extraction_schema={
"type": "object",
"properties": {
"price": {
"type": "number",
"description": "Current Bitcoin price in USD"
}
}
}
)
```
```typescript TypeScript
const result = await client.runTask({
body: {
prompt: "Get the current Bitcoin price in USD",
url: "https://coinmarketcap.com/currencies/bitcoin/",
data_extraction_schema: {
type: "object",
properties: {
price: {
type: "number",
description: "Current Bitcoin price in USD",
},
},
},
},
});
```
```bash cURL
curl -X POST "https://api.skyvern.com/v1/run/tasks" \
-H "x-api-key: $SKYVERN_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"prompt": "Get the current Bitcoin price in USD",
"url": "https://coinmarketcap.com/currencies/bitcoin/",
"data_extraction_schema": {
"type": "object",
"properties": {
"price": {
"type": "number",
"description": "Current Bitcoin price in USD"
}
}
}
}'
```
**Output (when completed):**
```json
{
"price": 104521.37
}
```
---
### List of items
Extract multiple items with the same structure, such as the top posts from a news site:
```python Python
result = await client.run_task(
prompt="Get the top 5 posts",
url="https://news.ycombinator.com",
data_extraction_schema={
"type": "object",
"properties": {
"posts": {
"type": "array",
"description": "Top 5 posts from the front page",
"items": {
"type": "object",
"properties": {
"title": {
"type": "string",
"description": "Post title"
},
"points": {
"type": "integer",
"description": "Number of points"
},
"url": {
"type": "string",
"description": "Link to the post"
}
}
}
}
}
}
)
```
```typescript TypeScript
const result = await client.runTask({
body: {
prompt: "Get the top 5 posts",
url: "https://news.ycombinator.com",
data_extraction_schema: {
type: "object",
properties: {
posts: {
type: "array",
description: "Top 5 posts from the front page",
items: {
type: "object",
properties: {
title: {
type: "string",
description: "Post title",
},
points: {
type: "integer",
description: "Number of points",
},
url: {
type: "string",
description: "Link to the post",
},
},
},
},
},
},
},
});
```
```bash cURL
curl -X POST "https://api.skyvern.com/v1/run/tasks" \
-H "x-api-key: $SKYVERN_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"prompt": "Get the top 5 posts",
"url": "https://news.ycombinator.com",
"data_extraction_schema": {
"type": "object",
"properties": {
"posts": {
"type": "array",
"description": "Top 5 posts from the front page",
"items": {
"type": "object",
"properties": {
"title": {
"type": "string",
"description": "Post title"
},
"points": {
"type": "integer",
"description": "Number of points"
},
"url": {
"type": "string",
"description": "Link to the post"
}
}
}
}
}
}
}'
```
**Output (when completed):**
```json
{
"posts": [
{
"title": "Running Claude Code dangerously (safely)",
"points": 342,
"url": "https://blog.emilburzo.com/2026/01/running-claude-code-dangerously-safely/"
},
{
"title": "Linux kernel framework for PCIe device emulation",
"points": 287,
"url": "https://github.com/cakehonolulu/pciem"
},
{
"title": "I'm addicted to being useful",
"points": 256,
"url": "https://www.seangoedecke.com/addicted-to-being-useful/"
},
{
"title": "Level S4 solar radiation event",
"points": 198,
"url": "https://www.swpc.noaa.gov/news/g4-severe-geomagnetic-storm"
},
{
"title": "WebAssembly Text Format parser performance",
"points": 176,
"url": "https://blog.gplane.win/posts/improve-wat-parser-perf.html"
}
]
}
```
Arrays without limits extract everything visible on the page. Specify limits in your prompt (e.g., "top 5 posts") or the array description to control output size.
---
### Nested objects
Extract hierarchical data, such as a product with its pricing and availability:
```python Python
result = await client.run_task(
prompt="Get product details including pricing and availability",
url="https://www.amazon.com/dp/B0EXAMPLE",
data_extraction_schema={
"type": "object",
"properties": {
"product": {
"type": "object",
"description": "Product information",
"properties": {
"name": {
"type": "string",
"description": "Product name"
},
"pricing": {
"type": "object",
"description": "Pricing details",
"properties": {
"current_price": {
"type": "number",
"description": "Current price in USD"
},
"original_price": {
"type": "number",
"description": "Original price before discount"
},
"discount_percent": {
"type": "integer",
"description": "Discount percentage"
}
}
},
"availability": {
"type": "object",
"description": "Stock information",
"properties": {
"in_stock": {
"type": "boolean",
"description": "Whether the item is in stock"
},
"delivery_estimate": {
"type": "string",
"description": "Estimated delivery date"
}
}
}
}
}
}
}
)
```
```typescript TypeScript
const result = await client.runTask({
body: {
prompt: "Get product details including pricing and availability",
url: "https://www.amazon.com/dp/B0EXAMPLE",
data_extraction_schema: {
type: "object",
properties: {
product: {
type: "object",
description: "Product information",
properties: {
name: {
type: "string",
description: "Product name",
},
pricing: {
type: "object",
description: "Pricing details",
properties: {
current_price: {
type: "number",
description: "Current price in USD",
},
original_price: {
type: "number",
description: "Original price before discount",
},
discount_percent: {
type: "integer",
description: "Discount percentage",
},
},
},
availability: {
type: "object",
description: "Stock information",
properties: {
in_stock: {
type: "boolean",
description: "Whether the item is in stock",
},
delivery_estimate: {
type: "string",
description: "Estimated delivery date",
},
},
},
},
},
},
},
},
});
```
```bash cURL
curl -X POST "https://api.skyvern.com/v1/run/tasks" \
-H "x-api-key: $SKYVERN_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"prompt": "Get product details including pricing and availability",
"url": "https://www.amazon.com/dp/B0EXAMPLE",
"data_extraction_schema": {
"type": "object",
"properties": {
"product": {
"type": "object",
"description": "Product information",
"properties": {
"name": {
"type": "string",
"description": "Product name"
},
"pricing": {
"type": "object",
"description": "Pricing details",
"properties": {
"current_price": {
"type": "number",
"description": "Current price in USD"
},
"original_price": {
"type": "number",
"description": "Original price before discount"
},
"discount_percent": {
"type": "integer",
"description": "Discount percentage"
}
}
},
"availability": {
"type": "object",
"description": "Stock information",
"properties": {
"in_stock": {
"type": "boolean",
"description": "Whether the item is in stock"
},
"delivery_estimate": {
"type": "string",
"description": "Estimated delivery date"
}
}
}
}
}
}
}
}'
```
**Output (when completed):**
```json
{
"product": {
"name": "Wireless Bluetooth Headphones",
"pricing": {
"current_price": 79.99,
"original_price": 129.99,
"discount_percent": 38
},
"availability": {
"in_stock": true,
"delivery_estimate": "Tomorrow, Jan 21"
}
}
}
```
---
## Accessing extracted data
The extracted data appears in the `output` field of the completed run. Poll until the task reaches a terminal state, then access the output.
```python Python
result = await client.run_task(
prompt="Get the top post",
url="https://news.ycombinator.com",
data_extraction_schema={
"type": "object",
"properties": {
"title": {"type": "string", "description": "Post title"},
"points": {"type": "integer", "description": "Points"}
}
}
)
run_id = result.run_id
while True:
run = await client.get_run(run_id)
if run.status in ["completed", "failed", "terminated", "timed_out", "canceled"]:
break
await asyncio.sleep(5)
# Access the extracted data
print(f"Output: {run.output}")
```
```typescript TypeScript
const result = await client.runTask({
body: {
prompt: "Get the top post",
url: "https://news.ycombinator.com",
data_extraction_schema: {
type: "object",
properties: {
title: { type: "string", description: "Post title" },
points: { type: "integer", description: "Points" },
},
},
},
});
const runId = result.run_id;
while (true) {
const run = await client.getRun(runId);
if (["completed", "failed", "terminated", "timed_out", "canceled"].includes(run.status)) {
console.log(`Output: ${JSON.stringify(run.output)}`);
break;
}
await new Promise((resolve) => setTimeout(resolve, 5000));
}
```
```bash cURL
RUN_ID="your_run_id_here"
while true; do
RESPONSE=$(curl -s -X GET "https://api.skyvern.com/v1/runs/$RUN_ID" \
-H "x-api-key: $SKYVERN_API_KEY")
STATUS=$(echo "$RESPONSE" | jq -r '.status')
if [[ "$STATUS" == "completed" || "$STATUS" == "failed" || "$STATUS" == "terminated" || "$STATUS" == "timed_out" || "$STATUS" == "canceled" ]]; then
echo "$RESPONSE" | jq '.output'
break
fi
sleep 5
done
```
If using webhooks, the same `output` field appears in the webhook payload.
---
## Next steps
All available parameters for run_task
Execute tasks and retrieve results