diff --git a/docs/assets/searchgraph.png b/docs/assets/searchgraph.png index f57c652e..ab841b1d 100644 Binary files a/docs/assets/searchgraph.png and b/docs/assets/searchgraph.png differ diff --git a/docs/assets/smartscrapergraph.png b/docs/assets/smartscrapergraph.png index 021531a3..54707f8e 100644 Binary files a/docs/assets/smartscrapergraph.png and b/docs/assets/smartscrapergraph.png differ diff --git a/docs/assets/speechgraph.png b/docs/assets/speechgraph.png index 70b13062..e61c0346 100644 Binary files a/docs/assets/speechgraph.png and b/docs/assets/speechgraph.png differ diff --git a/docs/source/conf.py b/docs/source/conf.py index 3f323d6a..60b871a9 100644 --- a/docs/source/conf.py +++ b/docs/source/conf.py @@ -14,14 +14,16 @@ import sys # import all the modules sys.path.insert(0, os.path.abspath('../../')) -project = 'scrapegraphai' -copyright = '2024, Marco Vinciguerra' -author = 'Marco Vinciguerra' +project = 'ScrapeGraphAI' +copyright = '2024, ScrapeGraphAI' +author = 'Marco Vinciguerra, Marco Perini, Lorenzo Padoan' + +html_last_updated_fmt = "%b %d, %Y" # -- General configuration --------------------------------------------------- # https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration -extensions = ['sphinx.ext.autodoc', 'sphinx.ext.napoleon'] +extensions = ['sphinx.ext.autodoc', 'sphinx.ext.napoleon','sphinx_wagtail_theme'] templates_path = ['_templates'] exclude_patterns = [] @@ -29,4 +31,19 @@ exclude_patterns = [] # -- Options for HTML output ------------------------------------------------- # https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output -html_theme = 'sphinx_rtd_theme' +# html_theme = 'sphinx_rtd_theme' +html_theme = 'sphinx_wagtail_theme' + +html_theme_options = dict( + project_name = "ScrapeGraphAI", + logo = "scrapegraphai_logo.png", + logo_alt = "ScrapeGraphAI", + logo_height = 59, + logo_url = "https://scrapegraph-ai.readthedocs.io/en/latest/", + logo_width = 45, + github_url = "https://github.com/VinciGit00/Scrapegraph-ai/tree/main/docs/source/", + footer_links = ",".join( + ["Landing Page|https://scrapegraphai.com/", + "Docusaurus|https://scrapegraph-doc.onrender.com/docs/intro"] + ), +) \ No newline at end of file diff --git a/docs/source/getting_started/installation.rst b/docs/source/getting_started/installation.rst index 3e40f1c3..55a7361d 100644 --- a/docs/source/getting_started/installation.rst +++ b/docs/source/getting_started/installation.rst @@ -21,7 +21,9 @@ The library is available on PyPI, so it can be installed using the following com pip install scrapegraphai -**Note:** It is higly recommended to install the library in a virtual environment (conda, venv, etc.) +.. important:: + + It is higly recommended to install the library in a virtual environment (conda, venv, etc.) If your clone the repository, you can install the library using `poetry `_: diff --git a/docs/source/index.rst b/docs/source/index.rst index ab0c6180..3a5fa6fe 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -24,6 +24,7 @@ scrapers/graphs scrapers/llm scrapers/graph_config + scrapers/benchmarks .. toctree:: :maxdepth: 2 diff --git a/docs/source/introduction/overview.rst b/docs/source/introduction/overview.rst index 1ed4167c..867e50cc 100644 --- a/docs/source/introduction/overview.rst +++ b/docs/source/introduction/overview.rst @@ -24,12 +24,14 @@ This flexibility ensures that scrapers remain functional even when website layou We support many Large Language Models (LLMs) including GPT, Gemini, Groq, Azure, Hugging Face etc. as well as local models which can run on your machine using Ollama. -Diagram -======= +Library Diagram +=============== + With ScrapegraphAI you first construct a pipeline of steps you want to execute by combining nodes into a graph. Executing the graph takes care of all the steps that are often part of scraping: fetching, parsing etc... Finally the scraped and processed data gets fed to an LLM which generates a response. .. image:: ../../assets/project_overview_diagram.png :align: center + :width: 70% :alt: ScrapegraphAI Overview diff --git a/docs/source/scrapers/benchmarks.rst b/docs/source/scrapers/benchmarks.rst new file mode 100644 index 00000000..b5521ef1 --- /dev/null +++ b/docs/source/scrapers/benchmarks.rst @@ -0,0 +1,23 @@ +Benchmarks +========== + +SearchGraph +^^^^^^^^^^^ + +`SearchGraph` instantiates multiple `SmartScraperGraph` object for each URL and extract the data from the HTML. +A concurrent approach is used to speed up the process and the following table shows the time required for a scraping task with different **batch sizes**. +Only two results are taken into account. + +.. list-table:: SearchGraph + :header-rows: 1 + + * - Batch Size + - Total Time (s) + * - 1 + - 31.1 + * - 2 + - 33.52 + * - 4 + - 28.47 + * - 16 + - 21.80 diff --git a/docs/source/scrapers/graph_config.rst b/docs/source/scrapers/graph_config.rst index a5ade9c5..dfc2062c 100644 --- a/docs/source/scrapers/graph_config.rst +++ b/docs/source/scrapers/graph_config.rst @@ -1,3 +1,5 @@ +.. _Configuration: + Additional Parameters ===================== diff --git a/docs/source/scrapers/graphs.rst b/docs/source/scrapers/graphs.rst index efd87537..cbcf1859 100644 --- a/docs/source/scrapers/graphs.rst +++ b/docs/source/scrapers/graphs.rst @@ -9,7 +9,9 @@ There are currently three types of graphs available in the library: - **SearchGraph**: multi-page scraper that only requires a user-defined prompt to extract information from a search engine using LLM. It is built on top of SmartScraperGraph. - **SpeechGraph**: text-to-speech pipeline that generates an answer as well as a requested audio file. It is built on top of SmartScraperGraph and requires a user-defined prompt and a URL (or local file). -**Note:** they all use a graph configuration to set up LLM models and other parameters. To find out more about the configurations, check the `LLM`_ and `Configuration`_ sections. +.. note:: + + They all use a graph configuration to set up LLM models and other parameters. To find out more about the configurations, check the :ref:`LLM` and :ref:`Configuration` sections. SmartScraperGraph ^^^^^^^^^^^^^^^^^ diff --git a/docs/source/scrapers/llm.rst b/docs/source/scrapers/llm.rst index 486668b1..c22844d2 100644 --- a/docs/source/scrapers/llm.rst +++ b/docs/source/scrapers/llm.rst @@ -1,3 +1,5 @@ +.. _llm: + LLM === @@ -7,7 +9,9 @@ These models are specified inside the graph configuration dictionary and can be - **Local Models**: These models are hosted on the local machine and can be used without any API key. - **API-based Models**: These models are hosted on the cloud and require an API key to access them (eg. OpenAI, Groq, etc). -**Note**: If the emebedding model is not specified, the library will use the default one for that LLM, if available. +.. note:: + + If the emebedding model is not specified, the library will use the default one for that LLM, if available. Local Models ------------ diff --git a/requirements-dev.txt b/requirements-dev.txt index 12d0e42f..e178a448 100644 --- a/requirements-dev.txt +++ b/requirements-dev.txt @@ -1,3 +1,3 @@ sphinx==7.1.2 -sphinx-rtd-theme==2.0.0 +sphinx-wagtail-theme-6.3.0 pytest==8.0.0