Files
parcer/README.md
Karishma Shukla ad8df66ecd chore: s
2025-11-24 21:06:44 +05:30

167 lines
7.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
<h2 align="center">
<div>
<a href="https://www.maxun.dev/?ref=ghread">
<img src="/src/assets/maxunlogo.png" width="70" />
<br>
Maxun
</a>
</div>
Transform the Web into Structured Intelligence<br>
</h2>
<p align="center">
✨ Turn any website into clean, contextualized data pipelines for your AI applications ✨
<br />
Maxun is the easiest way to extract web data with no code. The <b>modern</b> open-source alternative to BrowseAI, Octoparse and similar tools.
</p>
<p align="center">
<a href="https://app.maxun.dev/?ref=ghread"><b>Go To App</b></a> •
<a href="https://docs.maxun.dev/?ref=ghread"><b>Documentation</b></a> •
<a href="https://www.maxun.dev/?ref=ghread"><b>Website</b></a> •
<a href="https://discord.gg/5GbPjBUkws"><b>Discord</b></a> •
<a href="https://www.youtube.com/@MaxunOSS?ref=ghread"><b>Watch Tutorials</b></a>
<br />
<br />
<a href="https://trendshift.io/repositories/12113" target="_blank"><img src="https://trendshift.io/api/badge/repositories/12113" alt="getmaxun%2Fmaxun | Trendshift" style="width: 250px; height: 55px; margin-top: 10px;" width="250" height="55"/></a>
</p>
## What is Maxun?
Maxun is a powerful no-code ecosystem for web data extraction. With its intuitive no-code interface, anyone can extract data from **any** website — no coding required. In just minutes, users can build automation robots to **turn websites into structured APIs, LLM-ready markdown, spreadsheets, extract data at scale, and much more.**
## How Does It Work?
Maxun uses web robots to power everything you can do on the platform. There are two types of robots, each designed for a different job.
### 1. Extract Robots
**Extract robots emulate real user behavior and capture structured data at scale.**
- Built for automation and structured data extraction
- Point-and-click interface - no coding required
- Extract from any website, including behind logins
- Record user actions (clicks, scrolls, form fills, pagination, etc.)
- Convert sites into APIs, spreadsheets, and workflows
- Scale extractions and run on schedules or via API
- Handle infinite scrolling and pagination
- Auto-adapt to website layout & structural changes
https://github.com/user-attachments/assets/c6baa75f-b950-482c-8d26-8a8b6c5382c3
### 2. Scrape Robots
**Built for clean content and AI workflows.**
- Get clean HTML and LLM-ready Markdown from any website
- Remove scripts, styling, ads, and clutter automatically
- Perfect for RAG systems, AI summarization, embeddings, and content pipelines
- Extract main content while filtering out navigation and irrelevant elements
- Ideal for feeding clean data to large language models
https://github.com/user-attachments/assets/c774cbd4-5a85-45b7-b41f-128ee570eae6
## Quick Start
### Getting Started
The simplest & fastest way to get started is to use the hosted version: https://app.maxun.dev. You can self-host if you prefer!
### Installation
Maxun can run locally with or without Docker:
1. [Setup with Docker Compose](https://docs.maxun.dev/installation/docker)
2. [Setup without Docker](https://docs.maxun.dev/installation/local)
3. [Environment Variables](https://docs.maxun.dev/installation/environment_variables)
### Upgrading & Self Hosting
1. [Self Host Maxun With Docker & Portainer](https://docs.maxun.dev/self-host)
2. [Upgrade Maxun With Docker Compose Setup](https://docs.maxun.dev/installation/upgrade#upgrading-with-docker-compose)
3. [Upgrade Maxun Without Docker Compose Setup](https://docs.maxun.dev/installation/upgrade#upgrading-with-local-setup)
## What Can Robots Do?
-**Open webpages** and navigate sites automatically
-**Log in** to secured websites and maintain sessions
-**Click on buttons**, links, and interactive elements
-**Fill out forms** with custom data
-**Select from dropdowns**, radios, checkboxes, dates, times, etc.
-**Take screenshots** - fullpage or visible sections
-**Capture structured data** without writing code
-**Handle infinite scrolling** and pagination automatically
-**Run on schedules** - set it and forget it
-**Trigger via APIs** for third-party integrations
-**Extract behind login** walls and authentication
-**Integrate with applications** like N8N, Google Sheets, Airtable, and more
-**Send data to webhooks** for real-time processing
-**Get clean HTML** from websites for AI applications
-**Turn websites into LLM-ready markdown** for AI pipelines
-**Talk to your LLM** with MCP (Model Context Protocol)
## Sponsors
<table>
<tr>
<td width="229">
<br/>
<a href="https://www.lambdatest.com/?utm_source=maxun&utm_medium=sponsor" target="_blank">
<img src="https://github.com/user-attachments/assets/904dd40e-0498-47dd-98f1-7fa6d318adb9" /><br/><br/>
<b>LambdaTest</b>
</a>
<br/>
<sub>GenAI-powered Quality Engineering Platform that empowers teams to test intelligently, smarter, and ship faster.</sub>
</td>
<td width="250">
<a href="https://app.cyberyozh.com/?utm_source=github&utm_medium=maxun" target="_blank">
<img src="https://github.com/user-attachments/assets/c0ae7929-003a-4e1e-b23b-d174ac0aba4f" /><br/>
<b>CyberYozh App</b>
</a>
<br/>
<sub>Infrastructure for developers working with multiaccounting & automation in one place.</sub>
</td>
</tr>
</table>
## Features
-**Extract Data With No-Code** - Point and click interface
-**Two Robot Types** - Extract for structured data, Scrape for clean content
-**Handle Pagination & Scrolling** - Automatic navigation
-**Run Robots On Schedules** - Set it and forget it
-**Turn Websites to APIs** - RESTful endpoints from any site
-**Turn Websites to Spreadsheets** - Direct data export
-**Adapt To Website Layout Changes** - Auto-recovery from site updates
-**Extract Behind Login** - Handle authentication seamlessly
-**Integrations** - Connect with your favorite tools
-**MCP Support** - Model Context Protocol integration
-**LLM-Ready Data** - Clean Markdown for AI applications
-**Self-Hostable** - Full control over your infrastructure
-**Open Source** - Transparent and community-driven
### Use Cases
Maxun can be used for various use-cases, including lead generation, market research, content aggregation and more.
View use-cases in detail here: https://www.maxun.dev/#usecases
### Screenshots
![Maxun PH Launch (1)-1-1](https://github.com/user-attachments/assets/d7c75fa2-2bbc-47bb-a5f6-0ee6c162f391)
![Maxun PH Launch (1)-2-1](https://github.com/user-attachments/assets/d85a3ec7-8ce8-4daa-89aa-52d9617e227a)
![Maxun PH Launch (1)-3-1](https://github.com/user-attachments/assets/4bd5a0b4-485d-44f4-a487-edd9afc18b11)
![Maxun PH Launch (1)-4-1](https://github.com/user-attachments/assets/78140675-a6b6-49b2-981f-6a3d9a32b0b9)
![Maxun PH Launch (1)-5-1](https://github.com/user-attachments/assets/d9fe8519-c81c-4e45-92f2-b2939bf24192)
![Maxun PH Launch (1)-6-1](https://github.com/user-attachments/assets/c26e9ae3-c3da-4280-826a-c7cdf913fb93)
![Maxun PH Launch (1)-7-1](https://github.com/user-attachments/assets/fd7196f4-a6dc-4c4c-9c76-fdd93fac8247)
![Maxun PH Launch (1)-8-1](https://github.com/user-attachments/assets/16ee4a71-772a-49ae-a0e5-cb0529519bda)
![Maxun PH Launch (1)-9-1](https://github.com/user-attachments/assets/160f46fa-0357-4c1b-ba50-b4fe64453bb7)
### Note
This project is in early stages of development. Your feedback is very important for us - we're actively working on improvements. </a>
### License
<p>
This project is licensed under <a href="./LICENSE">AGPLv3</a>.
</p>
### Support Us
Star the repository, contribute if you love what were building, or [sponsor us](https://github.com/sponsors/amhsirak).
### Contributors
Thank you to the combined efforts of everyone who contributes!
<a href="https://github.com/getmaxun/maxun/graphs/contributors">
<img src="https://contrib.rocks/image?repo=getmaxun/maxun" />
</a>