Files
parcer/README.md
Karishma Shukla 46dd85cc2b chore: -rm alts
2026-01-03 20:22:50 +05:30

161 lines
7.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
<h2 align="center">
<div>
<a href="https://www.maxun.dev/?ref=ghread">
<img src="/src/assets/maxunlogo.png" width="70" />
<br>
Maxun
</a>
</div>
Transform the Web into Structured Intelligence<br>
</h2>
<p align="center">
✨ Turn any website into clean, contextualized data pipelines for your AI applications ✨
<br />
Maxun is the easiest way to extract web data with no code.
</p>
<p align="center">
<a href="https://app.maxun.dev/?ref=ghread"><b>Go To App</b></a> •
<a href="https://docs.maxun.dev/?ref=ghread"><b>Documentation</b></a> •
<a href="https://www.maxun.dev/?ref=ghread"><b>Website</b></a> •
<a href="https://discord.gg/5GbPjBUkws"><b>Discord</b></a> •
<a href="https://www.youtube.com/@MaxunOSS?ref=ghread"><b>Watch Tutorials</b></a>
<br />
<br />
<a href="https://trendshift.io/repositories/12113" target="_blank"><img src="https://trendshift.io/api/badge/repositories/12113" alt="getmaxun%2Fmaxun | Trendshift" style="width: 250px; height: 55px; margin-top: 10px;" width="250" height="55"/></a>
</p>
## What is Maxun?
Maxun helps you transform websites into structured APIs, clean markdown for AI workflows, and production-ready data pipelines — all in minutes.
### Ecosystem
1. **[Extract](https://docs.maxun.dev/category/extract)** Emulate real user behavior and collect structured data from any website. No code required.
* **[Recorder Mode](https://docs.maxun.dev/robot/extract/robot-actions)** - Record your actions as you browse; Maxun turns them into a reusable extraction robot.
* **[AI Mode](https://docs.maxun.dev/robot/extract/llm-extraction)** - Describe what you want in natural language and let LLM-powered extraction do the rest.
2. **[Scrape](https://docs.maxun.dev/robot/scrape/scrape-robots)** Convert full webpages into clean Markdown or HTML and capture screenshots. Ideal for AI workflows, agents, and document processing. No code required.
3. **[SDK](https://docs.maxun.dev/sdk/sdk-overview)** A complete developer toolkit for scraping, extraction, scheduling, and end-to-end data automation.
Whether you prefer browsing through a website or integrating automation into your codebase, Maxun adapts to your workflow.
## How Does It Work?
Maxun uses web robots to power everything you can do on the platform. There are two types of robots, each designed for a different job.
### 1. Extract Robots
**Extract robots emulate real user behavior and capture structured data.**
Choose how to build them
### a. Recorder Mode: Record your actions as you browse
- Build robots visually by browsing like a human.
- Perfect for structured, deterministic data extraction.
### Example: Extract 10 Property Listings from Airbnb
[https://github.com/user-attachments/assets/recorder-mode-demo-video](https://github.com/user-attachments/assets/c6baa75f-b950-482c-8d26-8a8b6c5382c3)
### b. LLM Extraction (Beta): Describe what you want in plain language
- Use natural language to define extraction patterns.
- Works with closed source & open source LLMs.
Get Started with LLM Extraction: https://docs.maxun.dev/robot/extract/llm-extraction
### Example: Extract Names, Rating & Duration of Top 50 Movies from IMDb
https://github.com/user-attachments/assets/f714e860-58d6-44ed-bbcd-c9374b629384
### Core capabilities
- Extract from any website, including behind logins
- Convert sites into APIs, spreadsheets, and workflows
- Scale extractions and run on schedules or via API
- Handle infinite scrolling and pagination
- Auto-adapt to website layout & structural changes
### 2. Scrape Robots
**Built for clean content and AI workflows.**
- Get clean HTML and LLM-ready Markdown from any website
- Remove scripts, styling, ads, and clutter automatically
- Perfect for RAG systems, AI summarization, embeddings, and content pipelines
- Ideal for feeding clean data to LLMs
### Example: Scrape GitHub Trending Repositories in clean Markdown format
https://github.com/user-attachments/assets/c774cbd4-5a85-45b7-b41f-128ee570eae6
## Quick Start
### Getting Started
The simplest & fastest way to get started is to use the hosted version: https://app.maxun.dev. You can self-host if you prefer!
### Installation
Maxun can run locally with or without Docker
1. [Setup with Docker Compose](https://docs.maxun.dev/installation/docker)
2. [Setup without Docker](https://docs.maxun.dev/installation/local)
3. [Environment Variables](https://docs.maxun.dev/installation/environment_variables)
4. [SDK](https://github.com/getmaxun/node-sdk)
### Upgrading & Self Hosting
1. [Self Host Maxun With Docker & Portainer](https://docs.maxun.dev/self-host)
2. [Upgrade Maxun With Docker Compose Setup](https://docs.maxun.dev/installation/upgrade#upgrading-with-docker-compose)
3. [Upgrade Maxun Without Docker Compose Setup](https://docs.maxun.dev/installation/upgrade#upgrading-with-local-setup)
## Sponsors
<table>
<tr>
<td width="229">
<br/>
<a href="https://www.lambdatest.com/?utm_source=maxun&utm_medium=sponsor" target="_blank">
<img src="https://github.com/user-attachments/assets/904dd40e-0498-47dd-98f1-7fa6d318adb9" /><br/><br/>
<b>LambdaTest</b>
</a>
<br/>
<sub>GenAI-powered Quality Engineering Platform that empowers teams to test intelligently, smarter, and ship faster.</sub>
</td>
</tr>
</table>
## Features
-**Extract Data With No-Code** Point and click interface
-**LLM-Powered Extraction** Describe what you want; use LLMs to scrape structured data
-**Developer SDK** Programmatic extraction, scheduling, and robot management
-**Handle Pagination & Scrolling** Automatic navigation
-**Run Robots On Schedules** Set it and forget it
-**Turn Websites to APIs** RESTful endpoints from any site
-**Turn Websites to Spreadsheets** Direct data export to Google Sheets & Airtable
-**Adapt To Website Layout Changes** Auto-recovery from site updates
-**Extract Behind Login** Handle authentication seamlessly
-**Integrations** Connect with your favorite tools
-**MCP Support** Model Context Protocol integration
-**LLM-Ready Data** Clean Markdown for AI applications
-**Self-Hostable** Full control over your infrastructure
-**Open Source** Transparent and community-driven
## Use Cases
Maxun can be used for various use-cases, including lead generation, market research, content aggregation and more.
View use-cases in detail here: https://www.maxun.dev/#usecases
## Note
This project is in early stages of development. Your feedback is very important for us - we're actively working on improvements. </a>
## License
<p>
This project is licensed under <a href="./LICENSE">AGPLv3</a>.
</p>
## Support Us
Star the repository, contribute if you love what were building, or [sponsor us](https://github.com/sponsors/amhsirak).
## Contributors
Thank you to the combined efforts of everyone who contributes!
<a href="https://github.com/getmaxun/maxun/graphs/contributors">
<img src="https://contrib.rocks/image?repo=getmaxun/maxun" />
</a>