Files
Dorod-Sky/docs/cookbooks/healthcare-portal-data.mdx
Naman bf8c7de8f9 cloud ui docs + cookbooks (#4759)
Co-authored-by: Ritik Sahni <ritiksahni0203@gmail.com>
Co-authored-by: Kunal Mishra <kunalm2345@gmail.com>
2026-02-16 22:14:40 +00:00

1100 lines
38 KiB
Plaintext

---
title: Healthcare Portal Data Extraction
subtitle: Extract patient demographics and billing data from OpenEMR
slug: cookbooks/healthcare-portal-data
---
This cookbook extracts two datasets from [OpenEMR](https://www.open-emr.org/), an open-source EHR, using the public demo at `https://demo.openemr.io/openemr/index.php`:
1. **Patient demographics** from Patient/Client > Finder
2. **Encounter billing data** from Reports > Visits > Superbill
**Demo credentials:** `admin` / `pass` (resets daily at 8:00 AM UTC)
---
## Prerequisites
- A [Skyvern Cloud](https://app.skyvern.com) account or [self-hosted](/self-hosted/overview) deployment
- The Skyvern SDK (for API usage)
<CodeGroup>
```bash Python
pip install skyvern
```
```bash TypeScript
npm install @skyvern/client
```
</CodeGroup>
---
## Why a single task isn't enough
A basic task pointed at OpenEMR with a vague prompt will partially work, but hits four problems in production:
<CodeGroup>
```python Python
result = await client.run_task(
url="https://demo.openemr.io/openemr/index.php",
prompt="Log in and extract the patient list",
)
```
```typescript TypeScript
const result = await client.runTask({
body: {
url: "https://demo.openemr.io/openemr/index.php",
prompt: "Log in and extract the patient list",
},
});
```
</CodeGroup>
| Problem | Impact |
|---------|--------|
| No proxy | Production EHR portals sit behind WAFs that block datacenter IPs |
| Login every run | Wastes steps, fragile with session complexity |
| Vague navigation | OpenEMR uses iframes and dynamic menus — needs explicit goals |
| No pagination | Only gets page 1 of multi-page results |
The sections below solve each one.
---
## Residential proxies
Route the browser through a residential IP to bypass WAF/bot detection. The demo works without one, but production portals require it.
<Tabs>
<Tab title="Cloud UI">
In the run panel, expand **Advanced Settings** and set **Proxy Location** to a country (e.g., **United States**).
</Tab>
<Tab title="API / SDK">
<CodeGroup>
```python Python
result = await client.run_task(
url="https://demo.openemr.io/openemr/index.php",
prompt="Log in with username 'admin' and password 'pass', confirm the Calendar page loads",
proxy_location="RESIDENTIAL",
)
```
```typescript TypeScript
const result = await client.runTask({
body: {
url: "https://demo.openemr.io/openemr/index.php",
prompt: "Log in with username 'admin' and password 'pass', confirm the Calendar page loads",
proxy_location: "RESIDENTIAL",
},
});
```
</CodeGroup>
</Tab>
</Tabs>
See [Proxy & Geolocation](/going-to-production/proxy-geolocation) for all available locations.
---
## Browser profiles
Log in once, save the browser state as a profile, and skip login on future runs.
<Tabs>
<Tab title="Cloud UI">
<Steps>
<Step title="Create a login workflow">
Go to **Workflows** in the sidebar and create a new workflow. Add a **Navigation** block with URL `https://demo.openemr.io/openemr/index.php` and goal: "Log in with username 'admin' and password 'pass'. Confirm the Calendar page loads."
On the **Start** node, expand the settings and enable **Save & Reuse Session**. Set **Proxy Location** to a country (e.g., **United States**).
<Frame>
<video
controls
muted
playsInline
className="w-full aspect-video rounded-xl"
src="/images/workflow-start.mp4"
></video>
</Frame>
</Step>
<Step title="Run the workflow">
Run the workflow and wait for it to complete.
</Step>
<Step title="Create a profile via API">
Browser profile creation is done via the API. Use the `create_browser_profile` call from the API/SDK tab with the completed workflow run ID. Name it `openemr-demo-admin`.
</Step>
</Steps>
</Tab>
<Tab title="API / SDK">
<CodeGroup>
```python Python
import asyncio
from skyvern import Skyvern
async def main():
client = Skyvern(api_key="YOUR_API_KEY")
# 1. Create workflow that saves browser state
workflow = await client.create_workflow(
json_definition={
"title": "OpenEMR Login",
"persist_browser_session": True,
"workflow_definition": {
"parameters": [],
"blocks": [
{
"block_type": "navigation",
"label": "login",
"url": "https://demo.openemr.io/openemr/index.php",
"navigation_goal": (
"Log in with username 'admin' and password 'pass'. "
"Confirm the Calendar page or main dashboard loads."
),
}
],
},
}
)
# 2. Run with residential proxy
run = await client.run_workflow(
workflow_id=workflow.workflow_permanent_id,
proxy_location="RESIDENTIAL",
wait_for_completion=True,
)
print(f"Login: {run.status}") # completed
# 3. Save profile (retry while session archives)
profile = None
for attempt in range(10):
try:
profile = await client.create_browser_profile(
name="openemr-demo-admin",
workflow_run_id=run.run_id,
)
break
except Exception as e:
if "persisted" in str(e).lower() and attempt < 9:
await asyncio.sleep(2)
continue
raise
print(f"Profile: {profile.browser_profile_id}")
asyncio.run(main())
```
```typescript TypeScript
import { Skyvern } from "@skyvern/client";
async function main() {
const client = new Skyvern({ apiKey: process.env.SKYVERN_API_KEY! });
// 1. Create workflow that saves browser state
const workflow = await client.createWorkflow({
body: {
json_definition: {
title: "OpenEMR Login",
persist_browser_session: true,
workflow_definition: {
parameters: [],
blocks: [
{
block_type: "navigation",
label: "login",
url: "https://demo.openemr.io/openemr/index.php",
navigation_goal:
"Log in with username 'admin' and password 'pass'. " +
"Confirm the Calendar page or main dashboard loads.",
},
],
},
},
},
});
// 2. Run with residential proxy
const run = await client.runWorkflow({
body: {
workflow_id: workflow.workflow_permanent_id,
proxy_location: "RESIDENTIAL",
},
waitForCompletion: true,
});
console.log(`Login: ${run.status}`);
// 3. Save profile (retry while session archives)
let profile;
for (let attempt = 0; attempt < 10; attempt++) {
try {
profile = await client.createBrowserProfile({
name: "openemr-demo-admin",
workflow_run_id: run.run_id,
});
break;
} catch (e) {
if (String(e).toLowerCase().includes("persisted") && attempt < 9) {
await new Promise((r) => setTimeout(r, 2000));
continue;
}
throw e;
}
}
console.log(`Profile: ${profile.browser_profile_id}`);
}
main();
```
</CodeGroup>
</Tab>
</Tabs>
<Note>
`persist_browser_session` is a workflow definition property — set it when creating the workflow, not when running it. See [Browser Profiles](/optimization/browser-profiles) for the full lifecycle.
</Note>
---
## Extract patient demographics
Navigate to **Patient/Client > Finder** and extract the results table.
<Tabs>
<Tab title="Cloud UI">
Create a workflow with two blocks:
1. **Navigation** block — URL: `https://demo.openemr.io/openemr/index.php`. Goal: "Click Patient/Client in the top menu, then click Finder. Click Search to display all patients."
2. **Extraction** block — Goal: "Extract all patient rows from the Patient Finder results table." Paste the patient schema into **Data Schema**.
On the **Start** node, set **Proxy Location** to a country (e.g., **United States**). Run the workflow.
</Tab>
<Tab title="API / SDK">
<CodeGroup>
```python Python
import asyncio
from skyvern import Skyvern
PATIENT_SCHEMA = {
"type": "object",
"properties": {
"patients": {
"type": "array",
"description": "Patient rows from the Patient Finder results table",
"items": {
"type": "object",
"properties": {
"name": {"type": "string", "description": "Patient full name (Last, First)"},
"pid": {"type": "string", "description": "Patient ID number"},
"dob": {"type": "string", "description": "Date of birth (YYYY-MM-DD)"},
"phone_home": {"type": "string", "description": "Home phone number"},
},
},
},
},
}
async def main():
client = Skyvern(api_key="YOUR_API_KEY")
run = await client.run_task(
url="https://demo.openemr.io/openemr/index.php",
prompt=(
"Click Patient/Client in the top menu, then click Finder. "
"Click Search to display all patients."
),
data_extraction_schema=PATIENT_SCHEMA,
proxy_location="RESIDENTIAL",
browser_session_id="YOUR_SESSION_ID",
)
while run.status not in ["completed", "failed", "terminated", "timed_out", "canceled"]:
await asyncio.sleep(5)
run = await client.get_run(run.run_id)
print(run.output)
asyncio.run(main())
```
```typescript TypeScript
import { Skyvern } from "@skyvern/client";
const PATIENT_SCHEMA = {
type: "object",
properties: {
patients: {
type: "array",
description: "Patient rows from the Patient Finder results table",
items: {
type: "object",
properties: {
name: { type: "string", description: "Patient full name (Last, First)" },
pid: { type: "string", description: "Patient ID number" },
dob: { type: "string", description: "Date of birth (YYYY-MM-DD)" },
phone_home: { type: "string", description: "Home phone number" },
},
},
},
},
} as const;
async function main() {
const client = new Skyvern({ apiKey: process.env.SKYVERN_API_KEY! });
let run = await client.runTask({
body: {
url: "https://demo.openemr.io/openemr/index.php",
prompt:
"Click Patient/Client in the top menu, then click Finder. " +
"Click Search to display all patients.",
data_extraction_schema: PATIENT_SCHEMA,
proxy_location: "RESIDENTIAL",
browser_session_id: "YOUR_SESSION_ID",
},
});
while (!["completed", "failed", "terminated", "timed_out", "canceled"].includes(run.status)) {
await new Promise((r) => setTimeout(r, 5000));
run = await client.getRun(run.run_id);
}
console.log(JSON.stringify(run.output, null, 2));
}
main();
```
</CodeGroup>
</Tab>
</Tabs>
**Example output:**
```json
{
"patients": [
{ "name": "Belford, Phil", "pid": "1", "dob": "1972-02-09", "phone_home": "333-444-2222" },
{ "name": "Underwood, Susan Ardmore", "pid": "2", "dob": "1967-02-08", "phone_home": "4443332222" },
{ "name": "Moore, Wanda", "pid": "3", "dob": "2007-02-18", "phone_home": null }
]
}
```
<Note>
The demo resets daily and community users add test patients, so exact records may differ.
</Note>
<Note>
Browser profiles cannot be used directly with standalone tasks. Create a [browser session](/optimization/browser-sessions) from the profile first, then pass the session ID. See [Pagination with browser sessions](#pagination-with-browser-sessions) below for the full pattern.
</Note>
---
## Extract encounter billing data
Navigate to **Reports > Visits > Superbill**, set a date range, and extract the report.
<Tabs>
<Tab title="Cloud UI">
Create a workflow with two blocks:
1. **Navigation** block — Goal: "Click Reports in the top menu, then Visits, then Superbill. Set the From date to 2020-01-01 and the To date to today. Click Submit."
2. **Extraction** block — Goal: "Extract all encounter rows from the Superbill report." Paste the encounter schema into **Data Schema**.
On the **Start** node, set **Proxy Location** to a country (e.g., **United States**). Run the workflow.
</Tab>
<Tab title="API / SDK">
<CodeGroup>
```python Python
ENCOUNTER_SCHEMA = {
"type": "object",
"properties": {
"encounters": {
"type": "array",
"description": "Encounter rows from the Superbill report",
"items": {
"type": "object",
"properties": {
"patient_name": {"type": "string", "description": "Patient name"},
"encounter_date": {"type": "string", "description": "Date of encounter (YYYY-MM-DD)"},
"provider": {"type": "string", "description": "Provider name"},
"billing_code": {"type": "string", "description": "CPT or billing code"},
"code_description": {"type": "string", "description": "Description of the billing code"},
"charge": {"type": "number", "description": "Fee amount in USD"},
},
},
},
},
}
run = await client.run_task(
url="https://demo.openemr.io/openemr/index.php",
prompt=(
"Click Reports in the top menu, then Visits, then Superbill. "
"Set the From date to 2020-01-01 and the To date to today. Click Submit."
),
data_extraction_schema=ENCOUNTER_SCHEMA,
proxy_location="RESIDENTIAL",
browser_session_id="YOUR_SESSION_ID",
)
```
```typescript TypeScript
const ENCOUNTER_SCHEMA = {
type: "object",
properties: {
encounters: {
type: "array",
description: "Encounter rows from the Superbill report",
items: {
type: "object",
properties: {
patient_name: { type: "string", description: "Patient name" },
encounter_date: { type: "string", description: "Date of encounter (YYYY-MM-DD)" },
provider: { type: "string", description: "Provider name" },
billing_code: { type: "string", description: "CPT or billing code" },
code_description: { type: "string", description: "Description of the billing code" },
charge: { type: "number", description: "Fee amount in USD" },
},
},
},
},
} as const;
let run = await client.runTask({
body: {
url: "https://demo.openemr.io/openemr/index.php",
prompt:
"Click Reports in the top menu, then Visits, then Superbill. " +
"Set the From date to 2020-01-01 and the To date to today. Click Submit.",
data_extraction_schema: ENCOUNTER_SCHEMA,
proxy_location: "RESIDENTIAL",
browser_session_id: "YOUR_SESSION_ID",
},
});
```
</CodeGroup>
</Tab>
</Tabs>
**Example output:**
```json
{
"encounters": [
{
"patient_name": "Phil Lopez",
"encounter_date": "2024-06-01",
"provider": "Administrator Administrator",
"billing_code": "99213",
"code_description": "Office/outpatient visit, est patient, low complexity",
"charge": 50.00
}
]
}
```
---
## Pagination with browser sessions
A [Browser Profile](/optimization/browser-profiles) is a saved snapshot. A [Browser Session](/optimization/browser-sessions) is a live browser instance that persists between tasks. Use sessions to paginate: extract page 1, click Next, extract page 2.
<Tabs>
<Tab title="Cloud UI">
<Steps>
<Step title="Create a session">
Go to **Browsers** in the sidebar. Click **Create Session**. Set **Proxy Location** to a country (e.g., **United States**) and configure the timeout.
<Frame>
<video
controls
muted
playsInline
className="w-full aspect-video rounded-xl"
src="/images/browser-session-create.mp4"
></video>
</Frame>
</Step>
<Step title="Navigate and extract page 1">
Run a task against the session: "Click Patient/Client > Finder. Click Search. Extract all patient rows."
</Step>
<Step title="Extract subsequent pages">
Run another task against the same session: "Click Next to go to the next page. Extract all patient rows." Repeat until no more results.
</Step>
</Steps>
</Tab>
<Tab title="API / SDK">
<CodeGroup>
```python Python
import asyncio
from skyvern import Skyvern
PATIENT_SCHEMA = {
"type": "object",
"properties": {
"patients": {
"type": "array",
"description": "Patient rows from the current page",
"items": {
"type": "object",
"properties": {
"name": {"type": "string", "description": "Patient full name (Last, First)"},
"pid": {"type": "string", "description": "Patient ID number"},
"dob": {"type": "string", "description": "Date of birth (YYYY-MM-DD)"},
"phone_home": {"type": "string", "description": "Home phone number"},
},
},
},
},
}
async def extract_page(client, session_id, page_num):
prompt = (
"Click Patient/Client in the top menu, then click Finder. "
"Click Search to display all patients. "
"Extract all patient rows from the results table."
) if page_num == 1 else (
"Click Next to go to the next page of results. "
"Extract all patient rows from the table."
)
run = await client.run_task(
url="https://demo.openemr.io/openemr/index.php",
prompt=prompt,
browser_session_id=session_id,
data_extraction_schema=PATIENT_SCHEMA,
)
while run.status not in ["completed", "failed", "terminated", "timed_out", "canceled"]:
await asyncio.sleep(5)
run = await client.get_run(run.run_id)
return run
async def main():
client = Skyvern(api_key="YOUR_API_KEY")
session = await client.create_browser_session(
browser_profile_id="YOUR_PROFILE_ID",
proxy_location="RESIDENTIAL",
)
all_patients = []
for page in range(1, 11):
run = await extract_page(client, session.browser_session_id, page)
if run.status != "completed":
break
patients = run.output.get("patients", [])
if not patients:
break
all_patients.extend(patients)
print(f"Page {page}: {len(patients)} patients ({len(all_patients)} total)")
print(f"Done: {len(all_patients)} patients")
asyncio.run(main())
```
```typescript TypeScript
import { Skyvern } from "@skyvern/client";
const PATIENT_SCHEMA = {
type: "object",
properties: {
patients: {
type: "array",
description: "Patient rows from the current page",
items: {
type: "object",
properties: {
name: { type: "string", description: "Patient full name (Last, First)" },
pid: { type: "string", description: "Patient ID number" },
dob: { type: "string", description: "Date of birth (YYYY-MM-DD)" },
phone_home: { type: "string", description: "Home phone number" },
},
},
},
},
} as const;
async function extractPage(client: Skyvern, sessionId: string, pageNum: number) {
const prompt =
pageNum === 1
? "Click Patient/Client in the top menu, then click Finder. " +
"Click Search to display all patients. " +
"Extract all patient rows from the results table."
: "Click Next to go to the next page of results. " +
"Extract all patient rows from the table.";
let run = await client.runTask({
body: {
url: "https://demo.openemr.io/openemr/index.php",
prompt,
browser_session_id: sessionId,
data_extraction_schema: PATIENT_SCHEMA,
},
});
while (!["completed", "failed", "terminated", "timed_out", "canceled"].includes(run.status)) {
await new Promise((r) => setTimeout(r, 5000));
run = await client.getRun(run.run_id);
}
return run;
}
async function main() {
const client = new Skyvern({ apiKey: process.env.SKYVERN_API_KEY! });
const session = await client.createBrowserSession({
browser_profile_id: "YOUR_PROFILE_ID",
proxy_location: "RESIDENTIAL",
});
const allPatients: any[] = [];
for (let page = 1; page <= 10; page++) {
const run = await extractPage(client, session.browser_session_id, page);
if (run.status !== "completed") break;
const patients = run.output?.patients ?? [];
if (patients.length === 0) break;
allPatients.push(...patients);
console.log(`Page ${page}: ${patients.length} patients (${allPatients.length} total)`);
}
console.log(`Done: ${allPatients.length} patients`);
}
main();
```
</CodeGroup>
**Expected output:**
```
Page 1: 25 patients (25 total)
Page 2: 18 patients (43 total)
Done: 43 patients
```
</Tab>
</Tabs>
<Warning>
You cannot use `browser_profile_id` and `browser_session_id` in the same request. Use the profile to create the session, then pass only the session ID to tasks.
</Warning>
---
## Error handling
OpenEMR can timeout or show session-expired pages. Use `error_code_mapping` on workflow blocks to classify failures, and `max_retries` to retry automatically.
<Tabs>
<Tab title="Cloud UI">
On each Navigation and Extraction block, expand **Advanced Settings** and enable **Error Messages**. Add this JSON:
```json
{ "session_expired": "Session expired, login required, or access denied page" }
```
<Frame>
<video
controls
muted
playsInline
className="w-full aspect-video rounded-xl"
src="/images/navigation-error-messages.mp4"
></video>
</Frame>
<Note>
`max_retries` is only available via the API. In the Cloud UI, Skyvern uses its default retry behavior. For fine-grained retry control, use the API/SDK approach.
</Note>
</Tab>
<Tab title="API / SDK">
Set `error_code_mapping` and `max_retries` directly on workflow blocks:
```json
{
"block_type": "extraction",
"label": "extract_patients",
"data_extraction_goal": "Extract all patient rows from the table",
"error_code_mapping": {
"session_expired": "Session expired, login required, or access denied page"
},
"max_retries": 3
}
```
For standalone tasks, handle retries in your calling code:
<CodeGroup>
```python Python
async def run_with_retry(client, session_id, page_num, max_retries=3):
for attempt in range(max_retries + 1):
run = await extract_page(client, session_id, page_num)
if run.status == "completed":
return run
is_session_error = "session" in (run.failure_reason or "").lower()
if is_session_error and attempt < max_retries:
await asyncio.sleep(2 ** attempt * 5)
continue
return run
return run
```
```typescript TypeScript
async function runWithRetry(
client: Skyvern, sessionId: string, pageNum: number, maxRetries = 3
) {
let run;
for (let attempt = 0; attempt <= maxRetries; attempt++) {
run = await extractPage(client, sessionId, pageNum);
if (run.status === "completed") return run;
const isSessionError = (run.failure_reason ?? "").toLowerCase().includes("session");
if (isSessionError && attempt < maxRetries) {
await new Promise((r) => setTimeout(r, Math.pow(2, attempt) * 5000));
continue;
}
return run;
}
return run!;
}
```
</CodeGroup>
</Tab>
</Tabs>
See [Error Handling](/going-to-production/error-handling) and [CAPTCHA & Bot Detection](/going-to-production/captcha-bot-detection) for more.
---
## Complete workflow
This workflow combines everything: navigate to the Patient Finder, extract demographics, navigate to Superbill, and extract billing data — with error recovery and residential proxy.
<Tabs>
<Tab title="Cloud UI">
<Steps>
<Step title="Create the workflow">
Go to **Workflows** and create a new workflow named "OpenEMR Daily Extract." On the **Start** node, enable **Save & Reuse Session** and set **Proxy Location** to a country (e.g., **United States**).
</Step>
<Step title="Block 1: Navigation — open Patient Finder">
Add a **Navigation** block. Set URL to `https://demo.openemr.io/openemr/index.php` and goal: "Click Patient/Client > Finder. Click Search to display all patients. If a login page appears, log in with 'admin'/'pass'." In **Advanced Settings**, enable **Error Messages** and add `{"session_expired": "Session expired, login required, or access denied page"}`.
</Step>
<Step title="Block 2: Extraction — patient demographics">
Add an **Extraction** block. Set goal: "Extract all patient rows from the Patient Finder results table." Paste the patient schema into **Data Schema**.
</Step>
<Step title="Block 3: Navigation — open Superbill">
Add another **Navigation** block. Set goal: "Click Reports > Visits > Superbill. Set From to 2020-01-01, To to today. Click Submit." Add the same error messages mapping.
</Step>
<Step title="Block 4: Extraction — encounter billing">
Add another **Extraction** block. Set goal: "Extract all encounter rows from the Superbill report." Paste the encounter schema into **Data Schema**.
</Step>
<Step title="Run">
Click **Run**. The workflow navigates to the Patient Finder, extracts demographics, then navigates to Superbill and extracts billing data.
</Step>
</Steps>
For multi-page results, combine with the [pagination pattern](#pagination-with-browser-sessions) above.
</Tab>
<Tab title="API / SDK">
<CodeGroup>
```python Python
import asyncio
from skyvern import Skyvern
PATIENT_SCHEMA = {
"type": "object",
"properties": {
"patients": {
"type": "array",
"description": "Patient rows from the Patient Finder results table",
"items": {
"type": "object",
"properties": {
"name": {"type": "string", "description": "Patient full name (Last, First)"},
"pid": {"type": "string", "description": "Patient ID number"},
"dob": {"type": "string", "description": "Date of birth (YYYY-MM-DD)"},
"phone_home": {"type": "string", "description": "Home phone number"},
},
},
},
},
}
ENCOUNTER_SCHEMA = {
"type": "object",
"properties": {
"encounters": {
"type": "array",
"description": "Encounter rows from the Superbill report",
"items": {
"type": "object",
"properties": {
"patient_name": {"type": "string", "description": "Patient name"},
"encounter_date": {"type": "string", "description": "Date of encounter (YYYY-MM-DD)"},
"provider": {"type": "string", "description": "Provider name"},
"billing_code": {"type": "string", "description": "CPT or billing code"},
"code_description": {"type": "string", "description": "Description of the billing code"},
"charge": {"type": "number", "description": "Fee amount in USD"},
},
},
},
},
}
SESSION_ERROR = "Session expired, login required, or access denied page"
async def main():
client = Skyvern(api_key="YOUR_API_KEY")
workflow = await client.create_workflow(
json_definition={
"title": "OpenEMR Daily Extract",
"persist_browser_session": True,
"workflow_definition": {
"parameters": [],
"blocks": [
{
"block_type": "navigation",
"label": "open_patient_finder",
"url": "https://demo.openemr.io/openemr/index.php",
"navigation_goal": (
"Click Patient/Client in the top menu, then click Finder. "
"Click Search to display all patients. "
"If a login page appears, log in with username 'admin' and password 'pass'."
),
"error_code_mapping": {"session_expired": SESSION_ERROR},
"max_retries": 3,
},
{
"block_type": "extraction",
"label": "extract_patients",
"data_extraction_goal": "Extract all patient rows from the Patient Finder results table",
"data_schema": PATIENT_SCHEMA,
"error_code_mapping": {"session_expired": SESSION_ERROR},
"max_retries": 2,
},
{
"block_type": "navigation",
"label": "open_superbill",
"navigation_goal": (
"Click Reports in the top menu, then Visits, then Superbill. "
"Set the From date to 2020-01-01 and the To date to today. Click Submit."
),
"error_code_mapping": {"session_expired": SESSION_ERROR},
"max_retries": 3,
},
{
"block_type": "extraction",
"label": "extract_encounters",
"data_extraction_goal": "Extract all encounter rows from the Superbill report",
"data_schema": ENCOUNTER_SCHEMA,
"error_code_mapping": {"session_expired": SESSION_ERROR},
"max_retries": 2,
},
],
},
}
)
print(f"Workflow: {workflow.workflow_permanent_id}")
run = await client.run_workflow(
workflow_id=workflow.workflow_permanent_id,
browser_profile_id="YOUR_PROFILE_ID",
proxy_location="RESIDENTIAL",
wait_for_completion=True,
)
print(f"Status: {run.status}")
print(f"Output: {run.output}")
asyncio.run(main())
```
```typescript TypeScript
import { Skyvern } from "@skyvern/client";
const PATIENT_SCHEMA = {
type: "object",
properties: {
patients: {
type: "array",
description: "Patient rows from the Patient Finder results table",
items: {
type: "object",
properties: {
name: { type: "string", description: "Patient full name (Last, First)" },
pid: { type: "string", description: "Patient ID number" },
dob: { type: "string", description: "Date of birth (YYYY-MM-DD)" },
phone_home: { type: "string", description: "Home phone number" },
},
},
},
},
} as const;
const ENCOUNTER_SCHEMA = {
type: "object",
properties: {
encounters: {
type: "array",
description: "Encounter rows from the Superbill report",
items: {
type: "object",
properties: {
patient_name: { type: "string", description: "Patient name" },
encounter_date: { type: "string", description: "Date of encounter (YYYY-MM-DD)" },
provider: { type: "string", description: "Provider name" },
billing_code: { type: "string", description: "CPT or billing code" },
code_description: { type: "string", description: "Description of the billing code" },
charge: { type: "number", description: "Fee amount in USD" },
},
},
},
},
} as const;
const SESSION_ERROR = "Session expired, login required, or access denied page";
async function main() {
const client = new Skyvern({ apiKey: process.env.SKYVERN_API_KEY! });
const workflow = await client.createWorkflow({
body: {
json_definition: {
title: "OpenEMR Daily Extract",
persist_browser_session: true,
workflow_definition: {
parameters: [],
blocks: [
{
block_type: "navigation",
label: "open_patient_finder",
url: "https://demo.openemr.io/openemr/index.php",
navigation_goal:
"Click Patient/Client in the top menu, then click Finder. " +
"Click Search to display all patients. " +
"If a login page appears, log in with username 'admin' and password 'pass'.",
error_code_mapping: { session_expired: SESSION_ERROR },
max_retries: 3,
},
{
block_type: "extraction",
label: "extract_patients",
data_extraction_goal: "Extract all patient rows from the Patient Finder results table",
data_schema: PATIENT_SCHEMA,
error_code_mapping: { session_expired: SESSION_ERROR },
max_retries: 2,
},
{
block_type: "navigation",
label: "open_superbill",
navigation_goal:
"Click Reports in the top menu, then Visits, then Superbill. " +
"Set the From date to 2020-01-01 and the To date to today. Click Submit.",
error_code_mapping: { session_expired: SESSION_ERROR },
max_retries: 3,
},
{
block_type: "extraction",
label: "extract_encounters",
data_extraction_goal: "Extract all encounter rows from the Superbill report",
data_schema: ENCOUNTER_SCHEMA,
error_code_mapping: { session_expired: SESSION_ERROR },
max_retries: 2,
},
],
},
},
},
});
console.log(`Workflow: ${workflow.workflow_permanent_id}`);
const run = await client.runWorkflow({
body: {
workflow_id: workflow.workflow_permanent_id,
browser_profile_id: "YOUR_PROFILE_ID",
proxy_location: "RESIDENTIAL",
},
waitForCompletion: true,
});
console.log(`Status: ${run.status}`);
console.log(`Output: ${JSON.stringify(run.output, null, 2)}`);
}
main();
```
</CodeGroup>
</Tab>
</Tabs>
| Technique | Purpose |
|-----------|---------|
| Residential proxy | Bypass WAF/bot detection |
| Browser profile | Skip login on every run |
| Navigation goals | Explicit menu clicks for iframe-based UI |
| JSON schemas | Consistent, structured output |
| Session reuse | Paginate multi-page results |
| Error mapping + retries | Recover from session timeouts |
<Tip>
The OpenEMR demo resets daily at 8:00 AM UTC, so profiles expire every day. In production, re-run your login workflow weekly or whenever extractions fail with auth errors. See [Browser Profiles](/optimization/browser-profiles) for the refresh pattern.
</Tip>
---
## Resources
<CardGroup cols={2}>
<Card
title="Browser Profiles"
icon="user"
href="/optimization/browser-profiles"
>
Full lifecycle: create, refresh, and delete saved browser state
</Card>
<Card
title="Proxy & Geolocation"
icon="globe"
href="/going-to-production/proxy-geolocation"
>
All proxy locations and country-specific routing options
</Card>
<Card
title="Credential Management"
icon="key"
href="/sdk-reference/credentials"
>
Securely store and use login credentials
</Card>
<Card
title="Error Handling"
icon="triangle-exclamation"
href="/going-to-production/error-handling"
>
Error code mapping, failure classification, and retry strategies
</Card>
<Card
title="Extract Structured Data"
icon="table"
href="/running-automations/extract-structured-data"
>
JSON schema design and the interactive schema builder
</Card>
</CardGroup>