Files
parcer/README.md

159 lines
7.5 KiB
Markdown
Raw Normal View History

<h2 align="center">
2024-10-30 08:35:10 +05:30
<div>
2025-01-18 20:32:59 +05:30
<a href="https://www.maxun.dev/?ref=ghread">
<img src="/src/assets/maxunlogo.png" width="70" />
2024-10-30 08:35:10 +05:30
<br>
Maxun
</a>
</div>
2025-11-21 01:08:21 +05:30
Transform the Web into Structured Intelligence<br>
</h2>
2024-10-30 08:35:10 +05:30
<p align="center">
2025-11-24 21:06:44 +05:30
✨ Turn any website into clean, contextualized data pipelines for your AI applications ✨
2025-11-21 01:08:21 +05:30
<br />
Maxun is the easiest way to extract web data with no code. The <b>modern</b> open-source alternative to BrowseAI, Octoparse and similar tools.
2024-10-30 08:35:10 +05:30
</p>
<p align="center">
<a href="https://app.maxun.dev/?ref=ghread"><b>Go To App</b></a> •
<a href="https://docs.maxun.dev/?ref=ghread"><b>Documentation</b></a> •
<a href="https://www.maxun.dev/?ref=ghread"><b>Website</b></a> •
<a href="https://discord.gg/5GbPjBUkws"><b>Discord</b></a> •
2024-12-25 05:42:36 +05:30
<a href="https://www.youtube.com/@MaxunOSS?ref=ghread"><b>Watch Tutorials</b></a>
2024-11-14 07:32:43 +05:30
<br />
<br />
<a href="https://trendshift.io/repositories/12113" target="_blank"><img src="https://trendshift.io/api/badge/repositories/12113" alt="getmaxun%2Fmaxun | Trendshift" style="width: 250px; height: 55px; margin-top: 10px;" width="250" height="55"/></a>
2024-10-30 08:35:10 +05:30
</p>
2025-11-21 01:08:21 +05:30
## What is Maxun?
Maxun is a powerful no-code ecosystem for web data extraction. With its intuitive no-code interface, anyone can extract data from **any** website — no coding required. In just minutes, users can build automation robots to **turn websites into structured APIs, LLM-ready markdown, spreadsheets, extract data at scale, and much more.**
2025-11-21 01:09:32 +05:30
## How Does It Work?
2025-11-21 01:08:21 +05:30
Maxun uses web robots to power everything you can do on the platform. There are two types of robots, each designed for a different job.
### 1. Extract Robots
**Extract robots emulate real user behavior and capture structured data at scale.**
- Built for automation and structured data extraction
- Point-and-click interface - no coding required
- Extract from any website, including behind logins
- Record user actions (clicks, scrolls, form fills, pagination, etc.)
- Convert sites into APIs, spreadsheets, and workflows
- Scale extractions and run on schedules or via API
- Handle infinite scrolling and pagination
- Auto-adapt to website layout & structural changes
https://github.com/user-attachments/assets/c6baa75f-b950-482c-8d26-8a8b6c5382c3
2024-10-30 08:35:10 +05:30
2025-11-21 01:08:21 +05:30
### 2. Scrape Robots
**Built for clean content and AI workflows.**
- Get clean HTML and LLM-ready Markdown from any website
- Remove scripts, styling, ads, and clutter automatically
- Perfect for RAG systems, AI summarization, embeddings, and content pipelines
- Extract main content while filtering out navigation and irrelevant elements
- Ideal for feeding clean data to large language models
https://github.com/user-attachments/assets/c774cbd4-5a85-45b7-b41f-128ee570eae6
## Quick Start
### Getting Started
2025-11-21 01:08:21 +05:30
The simplest & fastest way to get started is to use the hosted version: https://app.maxun.dev. You can self-host if you prefer!
### Installation
2025-11-21 01:08:21 +05:30
Maxun can run locally with or without Docker:
1. [Setup with Docker Compose](https://docs.maxun.dev/installation/docker)
2. [Setup without Docker](https://docs.maxun.dev/installation/local)
3. [Environment Variables](https://docs.maxun.dev/installation/environment_variables)
2025-11-21 01:08:21 +05:30
### Upgrading & Self Hosting
1. [Self Host Maxun With Docker & Portainer](https://docs.maxun.dev/self-host)
2. [Upgrade Maxun With Docker Compose Setup](https://docs.maxun.dev/installation/upgrade#upgrading-with-docker-compose)
3. [Upgrade Maxun Without Docker Compose Setup](https://docs.maxun.dev/installation/upgrade#upgrading-with-local-setup)
2025-11-21 01:08:21 +05:30
## What Can Robots Do?
-**Open webpages** and navigate sites automatically
-**Log in** to secured websites and maintain sessions
-**Click on buttons**, links, and interactive elements
-**Fill out forms** with custom data
-**Select from dropdowns**, radios, checkboxes, dates, times, etc.
-**Take screenshots** - fullpage or visible sections
-**Capture structured data** without writing code
-**Handle infinite scrolling** and pagination automatically
-**Run on schedules** - set it and forget it
-**Trigger via APIs** for third-party integrations
-**Extract behind login** walls and authentication
-**Integrate with applications** like N8N, Google Sheets, Airtable, and more
-**Send data to webhooks** for real-time processing
-**Get clean HTML** from websites for AI applications
-**Turn websites into LLM-ready markdown** for AI pipelines
-**Talk to your LLM** with MCP (Model Context Protocol)
## Sponsors
2024-10-30 09:06:36 +05:30
<table>
<tr>
2025-10-30 16:14:35 +05:30
<td width="229">
<br/>
<a href="https://www.lambdatest.com/?utm_source=maxun&utm_medium=sponsor" target="_blank">
<img src="https://github.com/user-attachments/assets/904dd40e-0498-47dd-98f1-7fa6d318adb9" /><br/><br/>
<b>LambdaTest</b>
</a>
<br/>
<sub>GenAI-powered Quality Engineering Platform that empowers teams to test intelligently, smarter, and ship faster.</sub>
</td>
</tr>
</table>
2025-11-21 01:08:21 +05:30
## Features
-**Extract Data With No-Code** - Point and click interface
-**Two Robot Types** - Extract for structured data, Scrape for clean content
-**Handle Pagination & Scrolling** - Automatic navigation
-**Run Robots On Schedules** - Set it and forget it
-**Turn Websites to APIs** - RESTful endpoints from any site
-**Turn Websites to Spreadsheets** - Direct data export
-**Adapt To Website Layout Changes** - Auto-recovery from site updates
-**Extract Behind Login** - Handle authentication seamlessly
-**Integrations** - Connect with your favorite tools
-**MCP Support** - Model Context Protocol integration
-**LLM-Ready Data** - Clean Markdown for AI applications
-**Self-Hostable** - Full control over your infrastructure
-**Open Source** - Transparent and community-driven
### Use Cases
2025-10-03 18:54:37 +05:30
Maxun can be used for various use-cases, including lead generation, market research, content aggregation and more.
View use-cases in detail here: https://www.maxun.dev/#usecases
2024-10-30 08:35:10 +05:30
### Screenshots
2024-11-05 02:07:59 +05:30
![Maxun PH Launch (1)-1-1](https://github.com/user-attachments/assets/d7c75fa2-2bbc-47bb-a5f6-0ee6c162f391)
![Maxun PH Launch (1)-2-1](https://github.com/user-attachments/assets/d85a3ec7-8ce8-4daa-89aa-52d9617e227a)
![Maxun PH Launch (1)-3-1](https://github.com/user-attachments/assets/4bd5a0b4-485d-44f4-a487-edd9afc18b11)
![Maxun PH Launch (1)-4-1](https://github.com/user-attachments/assets/78140675-a6b6-49b2-981f-6a3d9a32b0b9)
![Maxun PH Launch (1)-5-1](https://github.com/user-attachments/assets/d9fe8519-c81c-4e45-92f2-b2939bf24192)
![Maxun PH Launch (1)-6-1](https://github.com/user-attachments/assets/c26e9ae3-c3da-4280-826a-c7cdf913fb93)
![Maxun PH Launch (1)-7-1](https://github.com/user-attachments/assets/fd7196f4-a6dc-4c4c-9c76-fdd93fac8247)
![Maxun PH Launch (1)-8-1](https://github.com/user-attachments/assets/16ee4a71-772a-49ae-a0e5-cb0529519bda)
![Maxun PH Launch (1)-9-1](https://github.com/user-attachments/assets/160f46fa-0357-4c1b-ba50-b4fe64453bb7)
### Note
2025-06-26 20:12:39 +05:30
This project is in early stages of development. Your feedback is very important for us - we're actively working on improvements. </a>
2024-10-30 09:25:59 +05:30
### License
2024-10-30 08:35:10 +05:30
<p>
This project is licensed under <a href="./LICENSE">AGPLv3</a>.
</p>
### Support Us
2025-10-10 00:40:53 +05:30
Star the repository, contribute if you love what were building, or [sponsor us](https://github.com/sponsors/amhsirak).
2025-08-12 03:19:50 +05:30
### Contributors
2024-10-30 08:35:10 +05:30
Thank you to the combined efforts of everyone who contributes!
2024-10-30 12:19:29 +05:30
<a href="https://github.com/getmaxun/maxun/graphs/contributors">
<img src="https://contrib.rocks/image?repo=getmaxun/maxun" />
2024-10-30 08:35:10 +05:30
</a>