From b1df16181831d8bc9f5b40595b924f68c213496c Mon Sep 17 00:00:00 2001
From: kahwoo <github@kahwoo.com>
Date: Wed, 8 May 2024 20:39:54 +1000
Subject: [PATCH 1/6] Update examples.rst

fix formatting and add other needed models
---
 docs/source/getting_started/examples.rst | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/docs/source/getting_started/examples.rst b/docs/source/getting_started/examples.rst
index 11fb5a05..b6e2eb36 100644
--- a/docs/source/getting_started/examples.rst
+++ b/docs/source/getting_started/examples.rst
@@ -44,9 +44,12 @@ Local models
 
 Remember to have installed in your pc ollama `ollama <https://ollama.com/>`
 Remember to pull the right model for LLM and for the embeddings, like:
+
 .. code-block:: bash
 
    ollama pull llama3
+   ollama pull nomic-embed-text
+   ollama pull mistral
 
 After that, you can run the following code, using only your machine resources brum brum brum:
 

From 0ca52b1da672d7e6f126d25c0658f4a114b206d5 Mon Sep 17 00:00:00 2001
From: semantic-release-bot <semantic-release-bot@martynus.net>
Date: Wed, 8 May 2024 13:52:51 +0000
Subject: [PATCH 2/6] ci(release): 0.10.0 [skip ci]

## [0.10.0](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.9.0...v0.10.0) (2024-05-08)

### Features

* add claude documentation ([5bdee55](https://github.com/VinciGit00/Scrapegraph-ai/commit/5bdee558760521bab818efc6725739e2a0f55d20))
* add gemini embeddings ([79daa4c](https://github.com/VinciGit00/Scrapegraph-ai/commit/79daa4c112e076e9c5f7cd70bbbc6f5e4930832c))
* add llava integration ([019b722](https://github.com/VinciGit00/Scrapegraph-ai/commit/019b7223dc969c87c3c36b6a42a19b4423b5d2af))
* add new hugging_face models ([d5547a4](https://github.com/VinciGit00/Scrapegraph-ai/commit/d5547a450ccd8908f1cf73707142b3481fbc6baa))
* Fix bug for gemini case when embeddings config not passed ([726de28](https://github.com/VinciGit00/Scrapegraph-ai/commit/726de288982700dab8ab9f22af8e26f01c6198a7))
* fixed custom_graphs example and robots_node ([84fcb44](https://github.com/VinciGit00/Scrapegraph-ai/commit/84fcb44aaa36e84f775884138d04f4a60bb389be))
* multiple graph instances ([dbb614a](https://github.com/VinciGit00/Scrapegraph-ai/commit/dbb614a8dd88d7667fe3daaf0263f5d6e9be1683))
* **node:** multiple url search in SearchGraph + fixes ([930adb3](https://github.com/VinciGit00/Scrapegraph-ai/commit/930adb38f2154ba225342466bfd1846c47df72a0))
* refactoring search function ([aeb1acb](https://github.com/VinciGit00/Scrapegraph-ai/commit/aeb1acbf05e63316c91672c99d88f8a6f338147f))

### Bug Fixes

* bug on .toml ([f7d66f5](https://github.com/VinciGit00/Scrapegraph-ai/commit/f7d66f51818dbdfddd0fa326f26265a3ab686b20))
* **llm:** fixed gemini api_key ([fd01b73](https://github.com/VinciGit00/Scrapegraph-ai/commit/fd01b73b71b515206cfdf51c1d52136293494389))
* **examples:** local, mixed models and fixed SearchGraph embeddings problem ([6b71ec1](https://github.com/VinciGit00/Scrapegraph-ai/commit/6b71ec1d2be953220b6767bc429f4cf6529803fd))
* **examples:** openai std examples ([186c0d0](https://github.com/VinciGit00/Scrapegraph-ai/commit/186c0d035d1d211aff33c38c449f2263d9716a07))
* removed .lock file for deployment ([d4c7d4e](https://github.com/VinciGit00/Scrapegraph-ai/commit/d4c7d4e7fcc2110beadcb2fc91efc657ec6a485c))

### Docs

* update README.md ([17ec992](https://github.com/VinciGit00/Scrapegraph-ai/commit/17ec992b498839e001277e7bc3f0ebea49fbd00d))

### CI

* **release:** 0.10.0-beta.1 [skip ci] ([c47a505](https://github.com/VinciGit00/Scrapegraph-ai/commit/c47a505750ee63e0220b339478953155ef1f1771))
* **release:** 0.10.0-beta.2 [skip ci] ([3f0e069](https://github.com/VinciGit00/Scrapegraph-ai/commit/3f0e0694f3b08463f025586777f7c0594b5ecb14))
* **release:** 0.9.0-beta.2 [skip ci] ([5aa600c](https://github.com/VinciGit00/Scrapegraph-ai/commit/5aa600cb0a85d320ad8dc786af26ffa46dd4d097))
* **release:** 0.9.0-beta.3 [skip ci] ([da8c72c](https://github.com/VinciGit00/Scrapegraph-ai/commit/da8c72ce138bcfe2627924d25a67afcd22cfafd5))
* **release:** 0.9.0-beta.4 [skip ci] ([8c5397f](https://github.com/VinciGit00/Scrapegraph-ai/commit/8c5397f67a9f05e0c00f631dd297b5527263a888))
* **release:** 0.9.0-beta.5 [skip ci] ([532adb6](https://github.com/VinciGit00/Scrapegraph-ai/commit/532adb639d58640bc89e8b162903b2ed97be9853))
* **release:** 0.9.0-beta.6 [skip ci] ([8c0b46e](https://github.com/VinciGit00/Scrapegraph-ai/commit/8c0b46eb40b446b270c665c11b2c6508f4d5f4be))
* **release:** 0.9.0-beta.7 [skip ci] ([6911e21](https://github.com/VinciGit00/Scrapegraph-ai/commit/6911e21584767460c59c5a563c3fd010857cbb67))
* **release:** 0.9.0-beta.8 [skip ci] ([739aaa3](https://github.com/VinciGit00/Scrapegraph-ai/commit/739aaa33c39c12e7ab7df8a0656cad140b35c9db))
---
 CHANGELOG.md   | 42 ++++++++++++++++++++++++++++++++++++++++++
 pyproject.toml |  2 +-
 2 files changed, 43 insertions(+), 1 deletion(-)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index bdd1ccf4..03ea0c69 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,3 +1,45 @@
+## [0.10.0](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.9.0...v0.10.0) (2024-05-08)
+
+
+### Features
+
+* add claude documentation ([5bdee55](https://github.com/VinciGit00/Scrapegraph-ai/commit/5bdee558760521bab818efc6725739e2a0f55d20))
+* add gemini embeddings ([79daa4c](https://github.com/VinciGit00/Scrapegraph-ai/commit/79daa4c112e076e9c5f7cd70bbbc6f5e4930832c))
+* add llava integration ([019b722](https://github.com/VinciGit00/Scrapegraph-ai/commit/019b7223dc969c87c3c36b6a42a19b4423b5d2af))
+* add new hugging_face models ([d5547a4](https://github.com/VinciGit00/Scrapegraph-ai/commit/d5547a450ccd8908f1cf73707142b3481fbc6baa))
+* Fix bug for gemini case when embeddings config not passed ([726de28](https://github.com/VinciGit00/Scrapegraph-ai/commit/726de288982700dab8ab9f22af8e26f01c6198a7))
+* fixed custom_graphs example and robots_node ([84fcb44](https://github.com/VinciGit00/Scrapegraph-ai/commit/84fcb44aaa36e84f775884138d04f4a60bb389be))
+* multiple graph instances ([dbb614a](https://github.com/VinciGit00/Scrapegraph-ai/commit/dbb614a8dd88d7667fe3daaf0263f5d6e9be1683))
+* **node:** multiple url search in SearchGraph + fixes ([930adb3](https://github.com/VinciGit00/Scrapegraph-ai/commit/930adb38f2154ba225342466bfd1846c47df72a0))
+* refactoring search function ([aeb1acb](https://github.com/VinciGit00/Scrapegraph-ai/commit/aeb1acbf05e63316c91672c99d88f8a6f338147f))
+
+
+### Bug Fixes
+
+* bug on .toml ([f7d66f5](https://github.com/VinciGit00/Scrapegraph-ai/commit/f7d66f51818dbdfddd0fa326f26265a3ab686b20))
+* **llm:** fixed gemini api_key ([fd01b73](https://github.com/VinciGit00/Scrapegraph-ai/commit/fd01b73b71b515206cfdf51c1d52136293494389))
+* **examples:** local, mixed models and fixed SearchGraph embeddings problem ([6b71ec1](https://github.com/VinciGit00/Scrapegraph-ai/commit/6b71ec1d2be953220b6767bc429f4cf6529803fd))
+* **examples:** openai std examples ([186c0d0](https://github.com/VinciGit00/Scrapegraph-ai/commit/186c0d035d1d211aff33c38c449f2263d9716a07))
+* removed .lock file for deployment ([d4c7d4e](https://github.com/VinciGit00/Scrapegraph-ai/commit/d4c7d4e7fcc2110beadcb2fc91efc657ec6a485c))
+
+
+### Docs
+
+* update README.md ([17ec992](https://github.com/VinciGit00/Scrapegraph-ai/commit/17ec992b498839e001277e7bc3f0ebea49fbd00d))
+
+
+### CI
+
+* **release:** 0.10.0-beta.1 [skip ci] ([c47a505](https://github.com/VinciGit00/Scrapegraph-ai/commit/c47a505750ee63e0220b339478953155ef1f1771))
+* **release:** 0.10.0-beta.2 [skip ci] ([3f0e069](https://github.com/VinciGit00/Scrapegraph-ai/commit/3f0e0694f3b08463f025586777f7c0594b5ecb14))
+* **release:** 0.9.0-beta.2 [skip ci] ([5aa600c](https://github.com/VinciGit00/Scrapegraph-ai/commit/5aa600cb0a85d320ad8dc786af26ffa46dd4d097))
+* **release:** 0.9.0-beta.3 [skip ci] ([da8c72c](https://github.com/VinciGit00/Scrapegraph-ai/commit/da8c72ce138bcfe2627924d25a67afcd22cfafd5))
+* **release:** 0.9.0-beta.4 [skip ci] ([8c5397f](https://github.com/VinciGit00/Scrapegraph-ai/commit/8c5397f67a9f05e0c00f631dd297b5527263a888))
+* **release:** 0.9.0-beta.5 [skip ci] ([532adb6](https://github.com/VinciGit00/Scrapegraph-ai/commit/532adb639d58640bc89e8b162903b2ed97be9853))
+* **release:** 0.9.0-beta.6 [skip ci] ([8c0b46e](https://github.com/VinciGit00/Scrapegraph-ai/commit/8c0b46eb40b446b270c665c11b2c6508f4d5f4be))
+* **release:** 0.9.0-beta.7 [skip ci] ([6911e21](https://github.com/VinciGit00/Scrapegraph-ai/commit/6911e21584767460c59c5a563c3fd010857cbb67))
+* **release:** 0.9.0-beta.8 [skip ci] ([739aaa3](https://github.com/VinciGit00/Scrapegraph-ai/commit/739aaa33c39c12e7ab7df8a0656cad140b35c9db))
+
 ## [0.10.0-beta.2](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.10.0-beta.1...v0.10.0-beta.2) (2024-05-08)
 
 
diff --git a/pyproject.toml b/pyproject.toml
index 39b0d030..498ac4c0 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -1,7 +1,7 @@
 [tool.poetry]
 name = "scrapegraphai"
 
-version = "0.10.0b2"
+version = "0.10.0"
 
 description = "A web scraping library based on LangChain which uses LLM and direct graph logic to create scraping pipelines."
 authors = [

From f8ce3d5916eab926275d59d4d48b0d89ec9cd43f Mon Sep 17 00:00:00 2001
From: mayurdb <mayurdb31@gmail.com>
Date: Fri, 10 May 2024 13:28:53 +0530
Subject: [PATCH 3/6] fix: Augment the information getting fetched from a
 webpage

---
 scrapegraphai/nodes/fetch_node.py             | 21 ++++++++++++++++---
 .../utils/{remover.py => cleanup_html.py}     | 11 ++++++----
 2 files changed, 25 insertions(+), 7 deletions(-)
 rename scrapegraphai/utils/{remover.py => cleanup_html.py} (78%)

diff --git a/scrapegraphai/nodes/fetch_node.py b/scrapegraphai/nodes/fetch_node.py
index bcd207f3..2667f0be 100644
--- a/scrapegraphai/nodes/fetch_node.py
+++ b/scrapegraphai/nodes/fetch_node.py
@@ -6,7 +6,9 @@ from typing import List, Optional
 from langchain_community.document_loaders import AsyncChromiumLoader
 from langchain_core.documents import Document
 from .base_node import BaseNode
-from ..utils.remover import remover
+from ..utils.cleanup_html import cleanup_html
+import requests
+from bs4 import BeautifulSoup
 
 
 class FetchNode(BaseNode):
@@ -32,6 +34,7 @@ class FetchNode(BaseNode):
     def __init__(self, input: str, output: List[str], node_config: Optional[dict]=None, node_name: str = "Fetch"):
         super().__init__(node_name, "node", input, output, 1)
 
+        self.useSoup = True if node_config is None else node_config.get("useSoup", True)
         self.headless = True if node_config is None else node_config.get("headless", True)
         self.verbose = False if node_config is None else node_config.get("verbose", False)
 
@@ -67,10 +70,22 @@ class FetchNode(BaseNode):
             })]
         # if it is a local directory
         elif not source.startswith("http"):
-            compressed_document = [Document(page_content=remover(source), metadata={
+            compressed_document = [Document(page_content=cleanup_html(source), metadata={
                 "source": "local_dir"
             })]
 
+        elif self.useSoup:
+            response = requests.get(source)
+            if response.status_code == 200:
+                soup = BeautifulSoup(response.text, 'html.parser')
+                links = soup.find_all('a')
+                link_urls = []
+                for link in links:
+                    if 'href' in link.attrs:
+                        link_urls.append(link['href'])
+                compressed_document = [Document(page_content=cleanup_html(soup.prettify(), link_urls))]
+            else:
+                print(f"Failed to retrieve contents from the webpage at url: {url}")
         else:
             if self.node_config is not None and self.node_config.get("endpoint") is not None:
                 
@@ -87,7 +102,7 @@ class FetchNode(BaseNode):
 
             document = loader.load()
             compressed_document = [
-                Document(page_content=remover(str(document[0].page_content)))]
+                Document(page_content=cleanup_html(str(document[0].page_content)))]
 
         state.update({self.output[0]: compressed_document})
         return state
diff --git a/scrapegraphai/utils/remover.py b/scrapegraphai/utils/cleanup_html.py
similarity index 78%
rename from scrapegraphai/utils/remover.py
rename to scrapegraphai/utils/cleanup_html.py
index 5e203249..aab1db65 100644
--- a/scrapegraphai/utils/remover.py
+++ b/scrapegraphai/utils/cleanup_html.py
@@ -5,7 +5,7 @@ from bs4 import BeautifulSoup
 from minify_html import minify
 
 
-def remover(html_content: str) -> str:
+def cleanup_html(html_content: str, urls: list = []) -> str:
     """
     Processes HTML content by removing unnecessary tags, minifying the HTML, and extracting the title and body content.
 
@@ -17,7 +17,7 @@ def remover(html_content: str) -> str:
 
     Example:
         >>> html_content = "<html><head><title>Example</title></head><body><p>Hello World!</p></body></html>"
-        >>> remover(html_content)
+        >>> cleanup_html(html_content)
         'Title: Example, Body: <body><p>Hello World!</p></body>'
 
     This function is particularly useful for preparing HTML content for environments where bandwidth usage needs to be minimized.
@@ -35,9 +35,12 @@ def remover(html_content: str) -> str:
 
     # Body Extraction (if it exists)
     body_content = soup.find('body')
+    urls_content = ""
+    if urls:
+        urls_content = f", URLs in page: {urls}"
     if body_content:
         # Minify the HTML within the body tag
         minimized_body = minify(str(body_content))
-        return "Title: " + title + ", Body: " + minimized_body
+        return "Title: " + title + ", Body: " + minimized_body + urls_content
 
-    return "Title: " + title + ", Body: No body content found"
+    return "Title: " + title + ", Body: No body content found" + urls_content

From 63c0dd93723c2ab55df0a66b555e7fbb4716ea77 Mon Sep 17 00:00:00 2001
From: semantic-release-bot <semantic-release-bot@martynus.net>
Date: Fri, 10 May 2024 09:15:24 +0000
Subject: [PATCH 4/6] ci(release): 0.11.0-beta.1 [skip ci]

## [0.11.0-beta.1](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.10.0...v0.11.0-beta.1) (2024-05-10)

### Features

* Add support for passing pdf path as source ([f10f3b1](https://github.com/VinciGit00/Scrapegraph-ai/commit/f10f3b1438e0c625b7f2fa52faeb5a6c12116113))
* update info ([4ed0fb8](https://github.com/VinciGit00/Scrapegraph-ai/commit/4ed0fb89c3e6068190a7775bedcb6ae65ba59d18))

### Bug Fixes

* add json integration ([0ab31c3](https://github.com/VinciGit00/Scrapegraph-ai/commit/0ab31c3fdbd56652ed306e60109301f60e8042d3))
* Augment the information getting fetched from a webpage ([f8ce3d5](https://github.com/VinciGit00/Scrapegraph-ai/commit/f8ce3d5916eab926275d59d4d48b0d89ec9cd43f))
* fixed bugs for csv and xml ([324e977](https://github.com/VinciGit00/Scrapegraph-ai/commit/324e977b853ecaa55bac4bf86e7cd927f7f43d0d))
* limit python version to < 3.12 ([a37fbbc](https://github.com/VinciGit00/Scrapegraph-ai/commit/a37fbbcbcfc3ddd0cc66f586f279676b52c4abfe))

### CI

* **release:** 0.10.0-beta.3 [skip ci] ([ad32298](https://github.com/VinciGit00/Scrapegraph-ai/commit/ad32298e70fc626fd62c897e153b806f79dba9b9))
* **release:** 0.10.0-beta.4 [skip ci] ([548bff9](https://github.com/VinciGit00/Scrapegraph-ai/commit/548bff9d77c8b4d2aadee40e966a06cc9d7fd4ab))
* **release:** 0.10.0-beta.5 [skip ci] ([28c9dce](https://github.com/VinciGit00/Scrapegraph-ai/commit/28c9dce7cbda49750172bafd7767fa48a0c33859))
* **release:** 0.10.0-beta.6 [skip ci] ([460d292](https://github.com/VinciGit00/Scrapegraph-ai/commit/460d292af21fabad3fdd2b66110913ccee22ba91))
---
 CHANGELOG.md   | 23 +++++++++++++++++++++++
 pyproject.toml |  2 +-
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index dffb9062..5e781284 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,3 +1,26 @@
+## [0.11.0-beta.1](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.10.0...v0.11.0-beta.1) (2024-05-10)
+
+
+### Features
+
+* Add support for passing pdf path as source ([f10f3b1](https://github.com/VinciGit00/Scrapegraph-ai/commit/f10f3b1438e0c625b7f2fa52faeb5a6c12116113))
+* update info ([4ed0fb8](https://github.com/VinciGit00/Scrapegraph-ai/commit/4ed0fb89c3e6068190a7775bedcb6ae65ba59d18))
+
+
+### Bug Fixes
+
+* add json integration ([0ab31c3](https://github.com/VinciGit00/Scrapegraph-ai/commit/0ab31c3fdbd56652ed306e60109301f60e8042d3))
+* Augment the information getting fetched from a webpage ([f8ce3d5](https://github.com/VinciGit00/Scrapegraph-ai/commit/f8ce3d5916eab926275d59d4d48b0d89ec9cd43f))
+* fixed bugs for csv and xml ([324e977](https://github.com/VinciGit00/Scrapegraph-ai/commit/324e977b853ecaa55bac4bf86e7cd927f7f43d0d))
+* limit python version to < 3.12 ([a37fbbc](https://github.com/VinciGit00/Scrapegraph-ai/commit/a37fbbcbcfc3ddd0cc66f586f279676b52c4abfe))
+
+
+### CI
+
+* **release:** 0.10.0-beta.3 [skip ci] ([ad32298](https://github.com/VinciGit00/Scrapegraph-ai/commit/ad32298e70fc626fd62c897e153b806f79dba9b9))
+* **release:** 0.10.0-beta.4 [skip ci] ([548bff9](https://github.com/VinciGit00/Scrapegraph-ai/commit/548bff9d77c8b4d2aadee40e966a06cc9d7fd4ab))
+* **release:** 0.10.0-beta.5 [skip ci] ([28c9dce](https://github.com/VinciGit00/Scrapegraph-ai/commit/28c9dce7cbda49750172bafd7767fa48a0c33859))
+* **release:** 0.10.0-beta.6 [skip ci] ([460d292](https://github.com/VinciGit00/Scrapegraph-ai/commit/460d292af21fabad3fdd2b66110913ccee22ba91))
 
 ### Bug Fixes
 
diff --git a/pyproject.toml b/pyproject.toml
index 9cd6f618..074aedcc 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -1,7 +1,7 @@
 [tool.poetry]
 name = "scrapegraphai"
 
-version = "0.10.0b6"
+version = "0.11.0b1"
 
 description = "A web scraping library based on LangChain which uses LLM and direct graph logic to create scraping pipelines."
 authors = [

From 864aa91326c360992326e04811d272e55eac8355 Mon Sep 17 00:00:00 2001
From: Marco Perini <perinim.98@gmail.com>
Date: Fri, 10 May 2024 15:11:54 +0200
Subject: [PATCH 5/6] feat: revert fetch_node

---
 scrapegraphai/nodes/fetch_node.py             | 23 ++++---------------
 scrapegraphai/utils/__init__.py               |  1 +
 .../utils/{cleanup_html.py => remover.py}     | 11 ++++-----
 3 files changed, 9 insertions(+), 26 deletions(-)
 rename scrapegraphai/utils/{cleanup_html.py => remover.py} (78%)

diff --git a/scrapegraphai/nodes/fetch_node.py b/scrapegraphai/nodes/fetch_node.py
index eeb2d0b4..3eabc66f 100644
--- a/scrapegraphai/nodes/fetch_node.py
+++ b/scrapegraphai/nodes/fetch_node.py
@@ -8,9 +8,7 @@ from langchain_community.document_loaders import AsyncChromiumLoader
 from langchain_core.documents import Document
 from langchain_community.document_loaders import PyPDFLoader
 from .base_node import BaseNode
-from ..utils.cleanup_html import cleanup_html
-import requests
-from bs4 import BeautifulSoup
+from ..utils.remover import remover
 
 
 class FetchNode(BaseNode):
@@ -36,7 +34,6 @@ class FetchNode(BaseNode):
     def __init__(self, input: str, output: List[str], node_config: Optional[dict] = None, node_name: str = "Fetch"):
         super().__init__(node_name, "node", input, output, 1)
 
-
         self.headless = True if node_config is None else node_config.get(
             "headless", True)
         self.verbose = False if node_config is None else node_config.get(
@@ -97,22 +94,10 @@ class FetchNode(BaseNode):
             pass
 
         elif not source.startswith("http"):
-            compressed_document = [Document(page_content=cleanup_html(source), metadata={
+            compressed_document = [Document(page_content=remover(source), metadata={
                 "source": "local_dir"
             })]
 
-        elif self.useSoup:
-            response = requests.get(source)
-            if response.status_code == 200:
-                soup = BeautifulSoup(response.text, 'html.parser')
-                links = soup.find_all('a')
-                link_urls = []
-                for link in links:
-                    if 'href' in link.attrs:
-                        link_urls.append(link['href'])
-                compressed_document = [Document(page_content=cleanup_html(soup.prettify(), link_urls))]
-            else:
-                print(f"Failed to retrieve contents from the webpage at url: {url}")
         else:
             if self.node_config is not None and self.node_config.get("endpoint") is not None:
 
@@ -129,7 +114,7 @@ class FetchNode(BaseNode):
 
             document = loader.load()
             compressed_document = [
-                Document(page_content=cleanup_html(str(document[0].page_content)))]
+                Document(page_content=remover(str(document[0].page_content)))]
 
         state.update({self.output[0]: compressed_document})
-        return state
+        return state
\ No newline at end of file
diff --git a/scrapegraphai/utils/__init__.py b/scrapegraphai/utils/__init__.py
index 0aee7839..218506f3 100644
--- a/scrapegraphai/utils/__init__.py
+++ b/scrapegraphai/utils/__init__.py
@@ -6,3 +6,4 @@ from .convert_to_csv import convert_to_csv
 from .convert_to_json import convert_to_json
 from .prettify_exec_info import prettify_exec_info
 from .proxy_rotation import proxy_generator
+from .remover import remover
diff --git a/scrapegraphai/utils/cleanup_html.py b/scrapegraphai/utils/remover.py
similarity index 78%
rename from scrapegraphai/utils/cleanup_html.py
rename to scrapegraphai/utils/remover.py
index aab1db65..c5a0507b 100644
--- a/scrapegraphai/utils/cleanup_html.py
+++ b/scrapegraphai/utils/remover.py
@@ -5,7 +5,7 @@ from bs4 import BeautifulSoup
 from minify_html import minify
 
 
-def cleanup_html(html_content: str, urls: list = []) -> str:
+def remover(html_content: str) -> str:
     """
     Processes HTML content by removing unnecessary tags, minifying the HTML, and extracting the title and body content.
 
@@ -17,7 +17,7 @@ def cleanup_html(html_content: str, urls: list = []) -> str:
 
     Example:
         >>> html_content = "<html><head><title>Example</title></head><body><p>Hello World!</p></body></html>"
-        >>> cleanup_html(html_content)
+        >>> remover(html_content)
         'Title: Example, Body: <body><p>Hello World!</p></body>'
 
     This function is particularly useful for preparing HTML content for environments where bandwidth usage needs to be minimized.
@@ -35,12 +35,9 @@ def cleanup_html(html_content: str, urls: list = []) -> str:
 
     # Body Extraction (if it exists)
     body_content = soup.find('body')
-    urls_content = ""
-    if urls:
-        urls_content = f", URLs in page: {urls}"
     if body_content:
         # Minify the HTML within the body tag
         minimized_body = minify(str(body_content))
-        return "Title: " + title + ", Body: " + minimized_body + urls_content
+        return "Title: " + title + ", Body: " + minimized_body
 
-    return "Title: " + title + ", Body: No body content found" + urls_content
+    return "Title: " + title + ", Body: No body content found"
\ No newline at end of file

From 7ae50c035e87be9a3d7b5eef42232dae6e345914 Mon Sep 17 00:00:00 2001
From: semantic-release-bot <semantic-release-bot@martynus.net>
Date: Fri, 10 May 2024 13:13:20 +0000
Subject: [PATCH 6/6] ci(release): 0.11.0-beta.2 [skip ci]

## [0.11.0-beta.2](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.11.0-beta.1...v0.11.0-beta.2) (2024-05-10)

### Features

* revert fetch_node ([864aa91](https://github.com/VinciGit00/Scrapegraph-ai/commit/864aa91326c360992326e04811d272e55eac8355))
---
 CHANGELOG.md   | 7 +++++++
 pyproject.toml | 2 +-
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 5e781284..4d89d3f4 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,3 +1,10 @@
+## [0.11.0-beta.2](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.11.0-beta.1...v0.11.0-beta.2) (2024-05-10)
+
+
+### Features
+
+* revert fetch_node ([864aa91](https://github.com/VinciGit00/Scrapegraph-ai/commit/864aa91326c360992326e04811d272e55eac8355))
+
 ## [0.11.0-beta.1](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.10.0...v0.11.0-beta.1) (2024-05-10)
 
 
diff --git a/pyproject.toml b/pyproject.toml
index 074aedcc..df00dfce 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -1,7 +1,7 @@
 [tool.poetry]
 name = "scrapegraphai"
 
-version = "0.11.0b1"
+version = "0.11.0b2"
 
 description = "A web scraping library based on LangChain which uses LLM and direct graph logic to create scraping pipelines."
 authors = [