docs(concurrent): refactor theme and added benchmarck searchgraph

2026-06-25 21:11:11 +08:00 · 2024-05-14 02:21:46 +02:00 · 2024-05-14 02:21:46 +02:00 · ced2bbcdc9
commit ced2bbcdc9
parent c0d26d61d7
12 changed files with 64 additions and 11 deletions
--- a/docs/assets/searchgraph.png
+++ b/docs/assets/searchgraph.png
--- a/docs/assets/smartscrapergraph.png
+++ b/docs/assets/smartscrapergraph.png
--- a/docs/assets/speechgraph.png
+++ b/docs/assets/speechgraph.png
--- a/docs/source/conf.py
+++ b/docs/source/conf.py
@ -14,14 +14,16 @@ import sys
 # import all the modules
 sys.path.insert(0, os.path.abspath('../../'))

-project = 'scrapegraphai'
-copyright = '2024, Marco Vinciguerra'
-author = 'Marco Vinciguerra'
+project = 'ScrapeGraphAI'
+copyright = '2024, ScrapeGraphAI'
+author = 'Marco Vinciguerra, Marco Perini, Lorenzo Padoan'
+
+html_last_updated_fmt = "%b %d, %Y"

 # -- General configuration ---------------------------------------------------
 # https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration

-extensions = ['sphinx.ext.autodoc', 'sphinx.ext.napoleon']
+extensions = ['sphinx.ext.autodoc', 'sphinx.ext.napoleon','sphinx_wagtail_theme']

 templates_path = ['_templates']
 exclude_patterns = []
@ -29,4 +31,19 @@ exclude_patterns = []
 # -- Options for HTML output -------------------------------------------------
 # https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output

-html_theme = 'sphinx_rtd_theme'
+# html_theme = 'sphinx_rtd_theme'
+html_theme = 'sphinx_wagtail_theme'
+
+html_theme_options = dict(
+    project_name = "ScrapeGraphAI",
+    logo = "scrapegraphai_logo.png",
+    logo_alt = "ScrapeGraphAI",
+    logo_height = 59,
+    logo_url = "https://scrapegraph-ai.readthedocs.io/en/latest/",
+    logo_width = 45,
+    github_url = "https://github.com/VinciGit00/Scrapegraph-ai/tree/main/docs/source/",
+    footer_links = ",".join(
+        ["Landing Page|https://scrapegraphai.com/",
+         "Docusaurus|https://scrapegraph-doc.onrender.com/docs/intro"]
+         ),
+)
--- a/docs/source/getting_started/installation.rst
+++ b/docs/source/getting_started/installation.rst
@ -21,7 +21,9 @@ The library is available on PyPI, so it can be installed using the following com

   pip install scrapegraphai

-**Note:** It is higly recommended to install the library in a virtual environment (conda, venv, etc.)
+.. important::
+   
+   It is higly recommended to install the library in a virtual environment (conda, venv, etc.)

 If your clone the repository, you can install the library using `poetry <https://python-poetry.org/docs/>`_:

--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@ -24,6 +24,7 @@
   scrapers/graphs
   scrapers/llm
   scrapers/graph_config
+   scrapers/benchmarks

 .. toctree::
   :maxdepth: 2
--- a/docs/source/introduction/overview.rst
+++ b/docs/source/introduction/overview.rst
@ -24,12 +24,14 @@ This flexibility ensures that scrapers remain functional even when website layou
 We support many Large Language Models (LLMs) including GPT, Gemini, Groq, Azure, Hugging Face etc.
 as well as local models which can run on your machine using Ollama.

-Diagram
-=======
+Library Diagram
+===============
+
 With ScrapegraphAI you first construct a pipeline of steps you want to execute by combining nodes into a graph.
 Executing the graph takes care of all the steps that are often part of scraping: fetching, parsing etc...
 Finally the scraped and processed data gets fed to an LLM which generates a response.

 .. image:: ../../assets/project_overview_diagram.png
   :align: center
+   :width: 70%
   :alt: ScrapegraphAI Overview
--- a/docs/source/scrapers/benchmarks.rst
+++ b/docs/source/scrapers/benchmarks.rst
@ -0,0 +1,23 @@
+Benchmarks
+==========
+
+SearchGraph
+^^^^^^^^^^^
+
+`SearchGraph` instantiates multiple `SmartScraperGraph` object for each URL and extract the data from the HTML.
+A concurrent approach is used to speed up the process and the following table shows the time required for a scraping task with different **batch sizes**.
+Only two results are taken into account.
+
+.. list-table:: SearchGraph
+   :header-rows: 1
+
+   * - Batch Size
+     - Total Time (s)
+   * - 1
+     - 31.1
+   * - 2
+     - 33.52
+   * - 4
+     - 28.47
+   * - 16
+     - 21.80
--- a/docs/source/scrapers/graph_config.rst
+++ b/docs/source/scrapers/graph_config.rst
@ -1,3 +1,5 @@
+.. _Configuration:
+
 Additional Parameters
 =====================

--- a/docs/source/scrapers/graphs.rst
+++ b/docs/source/scrapers/graphs.rst
@ -9,7 +9,9 @@ There are currently three types of graphs available in the library:
 - **SearchGraph**: multi-page scraper that only requires a user-defined prompt to extract information from a search engine using LLM. It is built on top of SmartScraperGraph.
 - **SpeechGraph**: text-to-speech pipeline that generates an answer as well as a requested audio file. It is built on top of SmartScraperGraph and requires a user-defined prompt and a URL (or local file).

-**Note:** they all use a graph configuration to set up LLM models and other parameters. To find out more about the configurations, check the `LLM`_ and `Configuration`_ sections.
+.. note::
+
+   They all use a graph configuration to set up LLM models and other parameters. To find out more about the configurations, check the :ref:`LLM` and :ref:`Configuration` sections.

 SmartScraperGraph
 ^^^^^^^^^^^^^^^^^
--- a/docs/source/scrapers/llm.rst
+++ b/docs/source/scrapers/llm.rst
@ -1,3 +1,5 @@
+.. _llm:
+
 LLM
 ===

@ -7,7 +9,9 @@ These models are specified inside the graph configuration dictionary and can be
 - **Local Models**: These models are hosted on the local machine and can be used without any API key.
 - **API-based Models**: These models are hosted on the cloud and require an API key to access them (eg. OpenAI, Groq, etc).

-**Note**: If the emebedding model is not specified, the library will use the default one for that LLM, if available.
+.. note::
+
+    If the emebedding model is not specified, the library will use the default one for that LLM, if available.

 Local Models
 ------------
--- a/requirements-dev.txt
+++ b/requirements-dev.txt
@ -1,3 +1,3 @@
 sphinx==7.1.2
-sphinx-rtd-theme==2.0.0
+sphinx-wagtail-theme-6.3.0
 pytest==8.0.0