Files
Dorod-Sky/fern/introduction.mdx
2025-05-19 00:53:57 -07:00

130 lines
5.5 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
title: Introduction
subtitle: 🐉 Automate Browser-based workflows using LLMs and Computer Vision 🐉
slug: introduction
---
[Skyvern](https://www.skyvern.com) automates browser-based workflows using LLMs and computer vision. It provides a simple API endpoint to fully automate manual workflows, replacing brittle or unreliable automation solutions.
<p align="center">
<img src="./images/geico_shu_recording_cropped.gif" alt="Recording of Skyvern automating an insurance quote being generated on Geico.com" />
</p>
# Why Skyvern?
Traditional approaches to browser automations required writing custom scripts for websites, often relying on DOM parsing and XPath-based interactions which would break whenever the website layouts changed.
Instead of only relying on code-defined XPath interactions, Skyvern relies on prompts in addition to computer vision and LLMs to the mix to parse items in the viewport in real-time, create a plan for interaction and interact with them.
This approach gives us a few advantages:
1. Skyvern can operate on websites its never seen before, as its able to map visual elements to actions necessary to complete a workflow, without any customized code
1. Skyvern is resistant to website layout changes, as there are no pre-determined XPaths or other selectors our system is looking for while trying to navigate
1. Skyvern is able to take a single workflow and apply it to a large number of websites, as its able to reason through the interactions necessary to complete the workflow
1. Skyvern leverages LLMs to reason through interactions to ensure we can cover complex situations. Examples include:
1. If you wanted to get an auto insurance quote from Geico, the answer to a common question “Were you eligible to drive at 18?” could be inferred from the driver receiving their license at age 16
1. If you were doing competitor analysis, its understanding that an Arnold Palmer 22 oz can at 7/11 is almost definitely the same product as a 23 oz can at Gopuff (even though the sizes are slightly different, which could be a rounding error!)
Want to see examples of Skyvern in action? [Check out some examples we have here](/getting-started/skyvern-in-action)
# How it works
Skyvern was inspired by the Task-Driven autonomous agent design popularized by [BabyAGI](https://github.com/yoheinakajima/babyagi) and [AutoGPT](https://github.com/Significant-Gravitas/AutoGPT) -- with one major bonus: we give Skyvern the ability to interact with websites using browser automation libraries like [Playwright](https://playwright.dev/).
<picture>
<img className="hidden dark:block" src="./images/skyvern-system-diagram-dark.png" alt="Skyvern's system diagram"/>
<img className="block dark:hidden" src="./images/skyvern-system-diagram-light.png" alt="Skyvern's system diagram" />
</picture>
## Skyvern use-cases
<CardGroup cols={2}>
<Card
title="Automatically apply to jobs"
icon="magnifying-glass"
href="/getting-started/skyvern-in-action#automatically-apply-to-jobs-on-sites-like-lever-co"
>
Watch Skyvern automatically apply to jobs
</Card>
<Card
title="Automate e-commerce transactions"
icon="cart-shopping"
href="/getting-started/skyvern-in-action#automate-transactions-on-e-commerce-websites"
>
Watch Skyvern automate purchases on e-commerce websites
</Card>
<Card
title="Interact with government websites"
icon="landmark-dome"
href="/getting-started/skyvern-in-action#navigating-to-government-websites-to-register-accounts-or-fill-out-forms"
>
Watch Skyvern automate interacting with government websites
</Card>
<Card
title="Generate insurance quotes"
icon="clipboard"
href="/getting-started/skyvern-in-action#retrieving-insurance-quotes-from-insurance-providers-in-any-language"
>
Watch Skyvern navigate complex multi-page forms in any language
</Card>
</CardGroup>
## Quickstart
### Python SDK Installation
> ⚠️ **REQUIREMENT**: Skyvern runs with Python 3.11 ⚠️
```bash
pip install skyvern
```
### Run Task
#### Run Task On A Skyvern Service
Get your API key from [Skyvern Cloud](https://app.skyvern.com/settings). Send the task to Skyvern Cloud:
```python
from skyvern import Skyvern
skyvern = Skyvern(api_key="YOUR API KEY")
# OR pass the base_url to use any Skyvern service
# skyvern = Skyvern(base_url="http://localhost:8000", api_key="YOUR API KEY")
task = await skyvern.agent.run_task(prompt="Find the top post on hackernews today")
print(task)
```
More API & SDK information can be found in the [API Reference](/api-reference) section.
#### Run Task Locally
You can also run browser tasks locally in your python code but it takes a bit more effort to set up the environment:
1. **Configure Skyvern** Run the setup wizard which will guide you through the configuration process, including Skyvern [MCP](https://github.com/Skyvern-AI/skyvern/blob/main/integrations/mcp/README.md) integration. This will generate a `.env` as the configuration settings file.
```bash
skyvern init
```
2. **Run Task In Python Code**
```python
from skyvern import Skyvern
skyvern = Skyvern()
task = await skyvern.run_task(prompt="Find the top post on hackernews today")
print(task)
```
<CardGroup cols={2}>
<Card
title="Quickstart (Skyvern Cloud)"
icon="cloud"
href="https://app.skyvern.com"
>
Start automating tasks with the hosted version of Skyvern
</Card>
<Card
title="Quickstart (Open Source)"
icon="github"
href="https://github.com/Skyvern-AI/Skyvern"
>
Start automating tasks in your own cloud
</Card>
</CardGroup>