From edc0c3247779605a9d2006f29fd4205a472c04c3 Mon Sep 17 00:00:00 2001 From: Karishma Shukla Date: Wed, 30 Oct 2024 08:35:10 +0530 Subject: [PATCH 01/12] chore: readme --- README.md | 59 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 59 insertions(+) create mode 100644 README.md diff --git a/README.md b/README.md new file mode 100644 index 00000000..c82070b6 --- /dev/null +++ b/README.md @@ -0,0 +1,59 @@ +

+
+ + +
+ Maxun +
+
+ Open-Source No-Code Web Data Extraction Platform
+

+ +

+Maxun lets you train a robot in 2 minutes and scrape the web on auto-pilot. Web data extraction doesn't get easier than this! +

+ + +

+ Discord • + Twitter • +

+ +// add demo video here + + + + +# Join Our Community + +

+ Discord • + Twitter • +

+ +# Installation + +# Features + +# Cloud + +# Contributing + +Please refer to [Contribution Guide](https://github.com/amhsirak/maxun/blob/master/.github/CONTRIBUTING.md). + +# + + +# License + +

+This project is licensed under AGPLv3. +

+ +# Contributors + +Thank you to the combined efforts of everyone who contributes! + + + + From b70e76d993186211bc8b18e3c15a2e0ed17164d8 Mon Sep 17 00:00:00 2001 From: Karishma Shukla Date: Wed, 30 Oct 2024 09:06:36 +0530 Subject: [PATCH 02/12] feat: robot actions and features --- README.md | 34 +++++++++++++++++++++++++++------- 1 file changed, 27 insertions(+), 7 deletions(-) diff --git a/README.md b/README.md index c82070b6..52027eba 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,4 @@ -

+

@@ -7,7 +7,7 @@
Open-Source No-Code Web Data Extraction Platform
-

+

Maxun lets you train a robot in 2 minutes and scrape the web on auto-pilot. Web data extraction doesn't get easier than this! @@ -19,8 +19,7 @@ Maxun lets you train a robot in 2 minutes and scrape the web on auto-pilot. Web Twitter

-// add demo video here - +![maxun_demo](https://github.com/user-attachments/assets/a61ba670-e56a-4ae1-9681-0b4bd6ba9cdc) @@ -33,17 +32,38 @@ Maxun lets you train a robot in 2 minutes and scrape the web on auto-pilot. Web # Installation +# How Does It Work? +Maxun lets you create custom robots which emulate user actions and extract data. A robot can perform any of the actions: Capture List, Capture Text or Capture Screenshot. Once a robot is created, it will keep extracting data for you without manual intervention + +![Screenshot 2024-10-23 222138](https://github.com/user-attachments/assets/53573c98-769e-490d-829e-ada9fac0764f) + +### 1. Robot Actions +1. Capture List: Useful to extract structured and bulk items from the website. Example: Scrape products from Amazon etc. +2. Capture Text: Useful to extract individual text content from the website. +3. Capture Screenshot: Get fullpage or visible section screenshots of the website. + +### 2. BYOP +BYOP (Bring Your Own Proxy) lets you connect external proxies to bypass anti-bot protection. Currently, the proxies are per user. Soon you'll be able to configure proxy per robot. + + # Features +- Extract Data With No-Code +- Handle Pagination & Scrolling +- Run Robots On A Specific Schedule +- Convert Websites to APIs +- Convert Websites to Spreadsheets +- Adapt To Website Layout Changes (coming soon) +- Extract Behind Login, With Two-Factor Authentication Support (coming soon) +- Integrations (currently Google Sheet) +- +++ A lot of amazing things soon! # Cloud +We offer a managed cloud version to run Maxun without having to manage the infrastructure and extract data at scale. Maxun cloud also deals with anti-bot detection, huge proxy network, and CAPTCHA solving. If this interests you, [join the cloud waitlist](https://docs.google.com/forms/d/e/1FAIpQLSdbD2uhqC4sbg4eLZ9qrFbyrfkXZ2XsI6dQ0USRCQNZNn5pzg/viewform) as we launch soon. # Contributing Please refer to [Contribution Guide](https://github.com/amhsirak/maxun/blob/master/.github/CONTRIBUTING.md). -# - - # License

From 7f17a834e99e8a00172f0a61c3f62169555aedb6 Mon Sep 17 00:00:00 2001 From: Karishma Shukla Date: Wed, 30 Oct 2024 09:24:01 +0530 Subject: [PATCH 03/12] chore: features --- README.md | 31 +++++++++++-------------------- 1 file changed, 11 insertions(+), 20 deletions(-) diff --git a/README.md b/README.md index 52027eba..6f493030 100644 --- a/README.md +++ b/README.md @@ -15,21 +15,15 @@ Maxun lets you train a robot in 2 minutes and scrape the web on auto-pilot. Web

+ WebsiteDiscord • - Twitter • + Twitter

![maxun_demo](https://github.com/user-attachments/assets/a61ba670-e56a-4ae1-9681-0b4bd6ba9cdc) -# Join Our Community - -

- Discord • - Twitter • -

- # Installation # How Does It Work? @@ -47,31 +41,28 @@ BYOP (Bring Your Own Proxy) lets you connect external proxies to bypass anti-bot # Features -- Extract Data With No-Code -- Handle Pagination & Scrolling -- Run Robots On A Specific Schedule -- Convert Websites to APIs -- Convert Websites to Spreadsheets -- Adapt To Website Layout Changes (coming soon) -- Extract Behind Login, With Two-Factor Authentication Support (coming soon) -- Integrations (currently Google Sheet) +- ✨ Extract Data With No-Code +- ✨ Handle Pagination & Scrolling +- ✨ Run Robots On A Specific Schedule +- ✨ Turn Websites to APIs +- ✨ Turn Websites to Spreadsheets +- ✨ Adapt To Website Layout Changes (coming soon) +- ✨ Extract Behind Login, With Two-Factor Authentication Support (coming soon) +- ✨ Integrations (currently Google Sheet) - +++ A lot of amazing things soon! # Cloud -We offer a managed cloud version to run Maxun without having to manage the infrastructure and extract data at scale. Maxun cloud also deals with anti-bot detection, huge proxy network, and CAPTCHA solving. If this interests you, [join the cloud waitlist](https://docs.google.com/forms/d/e/1FAIpQLSdbD2uhqC4sbg4eLZ9qrFbyrfkXZ2XsI6dQ0USRCQNZNn5pzg/viewform) as we launch soon. +We offer a managed cloud version to run Maxun without having to manage the infrastructure and extract data at scale. Maxun cloud also deals with anti-bot detection, huge proxy network with automatic proxy rotation, and CAPTCHA solving. If this interests you, [join the cloud waitlist](https://docs.google.com/forms/d/e/1FAIpQLSdbD2uhqC4sbg4eLZ9qrFbyrfkXZ2XsI6dQ0USRCQNZNn5pzg/viewform) as we launch soon. # Contributing - Please refer to [Contribution Guide](https://github.com/amhsirak/maxun/blob/master/.github/CONTRIBUTING.md). # License -

This project is licensed under AGPLv3.

# Contributors - Thank you to the combined efforts of everyone who contributes! From 836dafa9915cf4777dc41ffe2604bbde18d42ab5 Mon Sep 17 00:00:00 2001 From: Karishma Shukla Date: Wed, 30 Oct 2024 09:25:59 +0530 Subject: [PATCH 04/12] chore: note --- README.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/README.md b/README.md index 6f493030..30e89739 100644 --- a/README.md +++ b/README.md @@ -54,6 +54,9 @@ BYOP (Bring Your Own Proxy) lets you connect external proxies to bypass anti-bot # Cloud We offer a managed cloud version to run Maxun without having to manage the infrastructure and extract data at scale. Maxun cloud also deals with anti-bot detection, huge proxy network with automatic proxy rotation, and CAPTCHA solving. If this interests you, [join the cloud waitlist](https://docs.google.com/forms/d/e/1FAIpQLSdbD2uhqC4sbg4eLZ9qrFbyrfkXZ2XsI6dQ0USRCQNZNn5pzg/viewform) as we launch soon. +# Note +This project is in early stages of development. We're actively working to improve the product. + # Contributing Please refer to [Contribution Guide](https://github.com/amhsirak/maxun/blob/master/.github/CONTRIBUTING.md). From dd1fb6a13c564bc0220855316c922a8e52187b49 Mon Sep 17 00:00:00 2001 From: Karishma Shukla Date: Wed, 30 Oct 2024 10:35:31 +0530 Subject: [PATCH 05/12] wip: env variables --- README.md | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) diff --git a/README.md b/README.md index 30e89739..226ebfe2 100644 --- a/README.md +++ b/README.md @@ -26,6 +26,30 @@ Maxun lets you train a robot in 2 minutes and scrape the web on auto-pilot. Web # Installation +# Envirnoment Variables +| Variable | Mandatory | Description | If Not Set | +|--------------|-----------|----------------------------|----------------------------- | +| `NODE_ENV` | Yes | Sets whether you are running the app locally or in production. | | +| `JWT_SECRET` | Yes | JWT secret is utilized to generate authentication tokens. | | +| `DB_NAME` | Yes | Brief description here. | Describe what happens here. | +| `DB_USER` | Yes | Brief description here. | Describe what happens here. | +| `DB_PASSWORD` | Yes | Brief description here. | Describe what happens here. | +| `DB_NAME` | Yes | Brief description here. | Describe what happens here. | +| `DB_USER` | Yes | Brief description here. | Describe what happens here. | +| `DB_HOST` | Yes | Sets whether you are running the app locally or in production. | | +| `DB_PORT` | Yes | JWT secret is utilized to generate authentication tokens. | | +| `ENCRYPTION_KEY` | Yes | Brief description here. | Describe what happens here. | +| `MINIO_ENDPOINT` | Yes | Brief description here. | Describe what happens here. | +| `MINIO_PORT` | Yes | Brief description here. | Describe what happens here. | +| `MINIO_ACCESS_KEY` | Yes | Brief description here. | Describe what happens here. | +| `GOOGLE_CLIENT_ID` | Yes | Brief description here. | Describe what happens here. | +| `GOOGLE_CLIENT_SECRET` | Yes | Brief description here. | Describe what happens here. | +| `GOOGLE_REDIRECT_URI` | Yes | Brief description here. | Describe what happens here. | +| `REDIS_HOST` | Yes | Brief description here. | Describe what happens here. | +| `REDIS_PORT` | Yes | Brief description here. | Describe what happens here. | +| `MAXUN_TELEMETRY` | No | Brief description here. | Describe what happens here. | + + # How Does It Work? Maxun lets you create custom robots which emulate user actions and extract data. A robot can perform any of the actions: Capture List, Capture Text or Capture Screenshot. Once a robot is created, it will keep extracting data for you without manual intervention From e8389f44515110525abd5da660f4da12ae2f46b5 Mon Sep 17 00:00:00 2001 From: Karishma Shukla Date: Wed, 30 Oct 2024 10:42:30 +0530 Subject: [PATCH 06/12] chore: env variables --- README.md | 41 ++++++++++++++++++++--------------------- 1 file changed, 20 insertions(+), 21 deletions(-) diff --git a/README.md b/README.md index 226ebfe2..ce35a6fc 100644 --- a/README.md +++ b/README.md @@ -27,27 +27,26 @@ Maxun lets you train a robot in 2 minutes and scrape the web on auto-pilot. Web # Installation # Envirnoment Variables -| Variable | Mandatory | Description | If Not Set | -|--------------|-----------|----------------------------|----------------------------- | -| `NODE_ENV` | Yes | Sets whether you are running the app locally or in production. | | -| `JWT_SECRET` | Yes | JWT secret is utilized to generate authentication tokens. | | -| `DB_NAME` | Yes | Brief description here. | Describe what happens here. | -| `DB_USER` | Yes | Brief description here. | Describe what happens here. | -| `DB_PASSWORD` | Yes | Brief description here. | Describe what happens here. | -| `DB_NAME` | Yes | Brief description here. | Describe what happens here. | -| `DB_USER` | Yes | Brief description here. | Describe what happens here. | -| `DB_HOST` | Yes | Sets whether you are running the app locally or in production. | | -| `DB_PORT` | Yes | JWT secret is utilized to generate authentication tokens. | | -| `ENCRYPTION_KEY` | Yes | Brief description here. | Describe what happens here. | -| `MINIO_ENDPOINT` | Yes | Brief description here. | Describe what happens here. | -| `MINIO_PORT` | Yes | Brief description here. | Describe what happens here. | -| `MINIO_ACCESS_KEY` | Yes | Brief description here. | Describe what happens here. | -| `GOOGLE_CLIENT_ID` | Yes | Brief description here. | Describe what happens here. | -| `GOOGLE_CLIENT_SECRET` | Yes | Brief description here. | Describe what happens here. | -| `GOOGLE_REDIRECT_URI` | Yes | Brief description here. | Describe what happens here. | -| `REDIS_HOST` | Yes | Brief description here. | Describe what happens here. | -| `REDIS_PORT` | Yes | Brief description here. | Describe what happens here. | -| `MAXUN_TELEMETRY` | No | Brief description here. | Describe what happens here. | +| Variable | Mandatory | Description | If Not Set | +|-----------------------|-----------|----------------------------------------------------------------------------------------------|--------------------------------------------------------------| +| `NODE_ENV` | Yes | Defines the app environment (`development`, `production`). | Defaults to `development`; app may not behave as expected. | +| `JWT_SECRET` | Yes | Secret key used to sign and verify JSON Web Tokens (JWTs) for authentication. | JWT authentication will not work. | +| `DB_NAME` | Yes | Name of the Postgres database to connect to. | Database connection will fail. | +| `DB_USER` | Yes | Username for Postgres database authentication. | Database connection will fail. | +| `DB_PASSWORD` | Yes | Password for Postgres database authentication. | Database connection will fail. | +| `DB_HOST` | Yes | Host address where the Postgres database server is running. | Database connection will fail. | +| `DB_PORT` | Yes | Port number used to connect to the Postgres database server. | Database connection will fail. | +| `ENCRYPTION_KEY` | Yes | Key used for encrypting sensitive data (proxies, passwords). | Encryption functionality will not work. | +| `MINIO_ENDPOINT` | Yes | Endpoint URL for MinIO, to store robot run screenshots. | Connection to MinIO storage will fail. | +| `MINIO_PORT` | Yes | Port number for MinIO service. | Connection to MinIO storage will fail. | +| `MINIO_ACCESS_KEY` | Yes | Access key for authenticating with MinIO. | MinIO authentication will fail. | +| `GOOGLE_CLIENT_ID` | No | Client ID for Google OAuth, used in authentication. | Google login will not work. | +| `GOOGLE_CLIENT_SECRET`| No | Client Secret for Google OAuth. | Google login will not work. | +| `GOOGLE_REDIRECT_URI` | No | Redirect URI for handling Google OAuth responses. | Google login will not work. | +| `REDIS_HOST` | Yes | Host address of the Redis server for caching. | Redis connection will fail, affecting performance. | +| `REDIS_PORT` | Yes | Port number for the Redis server. | Redis connection will fail, affecting performance. | +| `MAXUN_TELEMETRY` | No | Disables telemetry to stop sending anonymous usage data. Keeping it enabled helps us understand how the product is used and assess the impact of any new changes. Please keep it enabled. | Telemetry data will not be collected. | + # How Does It Work? From 647c25f687458ad793dbe2116fc458a356df1c46 Mon Sep 17 00:00:00 2001 From: Karishma Shukla Date: Wed, 30 Oct 2024 10:45:02 +0530 Subject: [PATCH 07/12] chore: env description --- README.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index ce35a6fc..a07d4c51 100644 --- a/README.md +++ b/README.md @@ -37,14 +37,14 @@ Maxun lets you train a robot in 2 minutes and scrape the web on auto-pilot. Web | `DB_HOST` | Yes | Host address where the Postgres database server is running. | Database connection will fail. | | `DB_PORT` | Yes | Port number used to connect to the Postgres database server. | Database connection will fail. | | `ENCRYPTION_KEY` | Yes | Key used for encrypting sensitive data (proxies, passwords). | Encryption functionality will not work. | -| `MINIO_ENDPOINT` | Yes | Endpoint URL for MinIO, to store robot run screenshots. | Connection to MinIO storage will fail. | +| `MINIO_ENDPOINT` | Yes | Endpoint URL for MinIO, to store Robot Run Screenshots. | Connection to MinIO storage will fail. | | `MINIO_PORT` | Yes | Port number for MinIO service. | Connection to MinIO storage will fail. | | `MINIO_ACCESS_KEY` | Yes | Access key for authenticating with MinIO. | MinIO authentication will fail. | -| `GOOGLE_CLIENT_ID` | No | Client ID for Google OAuth, used in authentication. | Google login will not work. | +| `GOOGLE_CLIENT_ID` | No | Client ID for Google OAuth, used for Google Sheet integration authentication. | Google login will not work. | | `GOOGLE_CLIENT_SECRET`| No | Client Secret for Google OAuth. | Google login will not work. | | `GOOGLE_REDIRECT_URI` | No | Redirect URI for handling Google OAuth responses. | Google login will not work. | -| `REDIS_HOST` | Yes | Host address of the Redis server for caching. | Redis connection will fail, affecting performance. | -| `REDIS_PORT` | Yes | Port number for the Redis server. | Redis connection will fail, affecting performance. | +| `REDIS_HOST` | Yes | Host address of the Redis server, used by BullMQ for scheduling robots. | Redis connection will fail. | +| `REDIS_PORT` | Yes | Port number for the Redis server. | Redis connection will fail. | | `MAXUN_TELEMETRY` | No | Disables telemetry to stop sending anonymous usage data. Keeping it enabled helps us understand how the product is used and assess the impact of any new changes. Please keep it enabled. | Telemetry data will not be collected. | From 4e8db3be92ecfc80697a71d37aa78755162dc0ae Mon Sep 17 00:00:00 2001 From: Karishma Shukla Date: Wed, 30 Oct 2024 10:56:10 +0530 Subject: [PATCH 08/12] chore: node env --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index a07d4c51..b8dbd9e6 100644 --- a/README.md +++ b/README.md @@ -29,7 +29,7 @@ Maxun lets you train a robot in 2 minutes and scrape the web on auto-pilot. Web # Envirnoment Variables | Variable | Mandatory | Description | If Not Set | |-----------------------|-----------|----------------------------------------------------------------------------------------------|--------------------------------------------------------------| -| `NODE_ENV` | Yes | Defines the app environment (`development`, `production`). | Defaults to `development`; app may not behave as expected. | +| `NODE_ENV` | Yes | Defines the app environment (`development`, `production`). | Defaults to `development`. | | `JWT_SECRET` | Yes | Secret key used to sign and verify JSON Web Tokens (JWTs) for authentication. | JWT authentication will not work. | | `DB_NAME` | Yes | Name of the Postgres database to connect to. | Database connection will fail. | | `DB_USER` | Yes | Username for Postgres database authentication. | Database connection will fail. | From a01ca42353ed9748fc534a29c4d9f7c0b6d2f9b6 Mon Sep 17 00:00:00 2001 From: Karishma Shukla Date: Wed, 30 Oct 2024 11:02:56 +0530 Subject: [PATCH 09/12] chore: local setup --- README.md | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/README.md b/README.md index b8dbd9e6..f624a107 100644 --- a/README.md +++ b/README.md @@ -25,6 +25,29 @@ Maxun lets you train a robot in 2 minutes and scrape the web on auto-pilot. Web # Installation +### Docker + +### Local Setup +1. Ensure you have Node.js, PostgreSQL, MinIO and Redis installed on your system. +2. Run the commands below: +``` +git clone https://github.com/getmaxun/maxun + +# change directory to the project root +cd maxun + +# install dependencies +npm install + +# change directory to maxun-core to install dependencies +cd maxun-core +npm install + +# start frontend and backend together +npm run start +``` +You can access the frontend at http://localhost:5173/ and backend at http://localhost:8080/ + # Envirnoment Variables | Variable | Mandatory | Description | If Not Set | From 053f5cb3f60b443f94d5a4c6236440d9b17a8557 Mon Sep 17 00:00:00 2001 From: Karishma Shukla Date: Wed, 30 Oct 2024 11:10:30 +0530 Subject: [PATCH 10/12] chore: feedback form --- README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index f624a107..c1ca8f90 100644 --- a/README.md +++ b/README.md @@ -77,12 +77,12 @@ Maxun lets you create custom robots which emulate user actions and extract data. ![Screenshot 2024-10-23 222138](https://github.com/user-attachments/assets/53573c98-769e-490d-829e-ada9fac0764f) -### 1. Robot Actions +## 1. Robot Actions 1. Capture List: Useful to extract structured and bulk items from the website. Example: Scrape products from Amazon etc. 2. Capture Text: Useful to extract individual text content from the website. 3. Capture Screenshot: Get fullpage or visible section screenshots of the website. -### 2. BYOP +## 2. BYOP BYOP (Bring Your Own Proxy) lets you connect external proxies to bypass anti-bot protection. Currently, the proxies are per user. Soon you'll be able to configure proxy per robot. @@ -101,7 +101,7 @@ BYOP (Bring Your Own Proxy) lets you connect external proxies to bypass anti-bot We offer a managed cloud version to run Maxun without having to manage the infrastructure and extract data at scale. Maxun cloud also deals with anti-bot detection, huge proxy network with automatic proxy rotation, and CAPTCHA solving. If this interests you, [join the cloud waitlist](https://docs.google.com/forms/d/e/1FAIpQLSdbD2uhqC4sbg4eLZ9qrFbyrfkXZ2XsI6dQ0USRCQNZNn5pzg/viewform) as we launch soon. # Note -This project is in early stages of development. We're actively working to improve the product. +This project is in early stages of development. Your feedback is very important for us - we're actively working to improve the product. Drop anonymous feedback here. # Contributing Please refer to [Contribution Guide](https://github.com/amhsirak/maxun/blob/master/.github/CONTRIBUTING.md). From 86aec48448367f7bf13adbd8a1d68a109c19f548 Mon Sep 17 00:00:00 2001 From: Karishma Shukla Date: Wed, 30 Oct 2024 11:58:56 +0530 Subject: [PATCH 11/12] chore: wip docker --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index c1ca8f90..e71db720 100644 --- a/README.md +++ b/README.md @@ -26,6 +26,7 @@ Maxun lets you train a robot in 2 minutes and scrape the web on auto-pilot. Web # Installation ### Docker +⚠️ Work In Progress. ### Local Setup 1. Ensure you have Node.js, PostgreSQL, MinIO and Redis installed on your system. From 63b6bc83d0c71087073cf4620039520210bf422d Mon Sep 17 00:00:00 2001 From: Karishma Shukla Date: Wed, 30 Oct 2024 12:06:23 +0530 Subject: [PATCH 12/12] chore: docker wip --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index e71db720..c68d274b 100644 --- a/README.md +++ b/README.md @@ -26,7 +26,7 @@ Maxun lets you train a robot in 2 minutes and scrape the web on auto-pilot. Web # Installation ### Docker -⚠️ Work In Progress. +⚠️ Work In Progress. Will be available by EOD. ### Local Setup 1. Ensure you have Node.js, PostgreSQL, MinIO and Redis installed on your system.