docs(concurrent): refactor theme and added benchmarck searchgraph

This commit is contained in:
Marco Perini 2024-05-14 02:21:46 +02:00
parent c0d26d61d7
commit ced2bbcdc9
12 changed files with 64 additions and 11 deletions

Binary file not shown.

Before

Width:  |  Height:  |  Size: 50 KiB

After

Width:  |  Height:  |  Size: 53 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 58 KiB

After

Width:  |  Height:  |  Size: 60 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 46 KiB

After

Width:  |  Height:  |  Size: 48 KiB

View File

@ -14,14 +14,16 @@ import sys
# import all the modules
sys.path.insert(0, os.path.abspath('../../'))
project = 'scrapegraphai'
copyright = '2024, Marco Vinciguerra'
author = 'Marco Vinciguerra'
project = 'ScrapeGraphAI'
copyright = '2024, ScrapeGraphAI'
author = 'Marco Vinciguerra, Marco Perini, Lorenzo Padoan'
html_last_updated_fmt = "%b %d, %Y"
# -- General configuration ---------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
extensions = ['sphinx.ext.autodoc', 'sphinx.ext.napoleon']
extensions = ['sphinx.ext.autodoc', 'sphinx.ext.napoleon','sphinx_wagtail_theme']
templates_path = ['_templates']
exclude_patterns = []
@ -29,4 +31,19 @@ exclude_patterns = []
# -- Options for HTML output -------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output
html_theme = 'sphinx_rtd_theme'
# html_theme = 'sphinx_rtd_theme'
html_theme = 'sphinx_wagtail_theme'
html_theme_options = dict(
project_name = "ScrapeGraphAI",
logo = "scrapegraphai_logo.png",
logo_alt = "ScrapeGraphAI",
logo_height = 59,
logo_url = "https://scrapegraph-ai.readthedocs.io/en/latest/",
logo_width = 45,
github_url = "https://github.com/VinciGit00/Scrapegraph-ai/tree/main/docs/source/",
footer_links = ",".join(
["Landing Page|https://scrapegraphai.com/",
"Docusaurus|https://scrapegraph-doc.onrender.com/docs/intro"]
),
)

View File

@ -21,7 +21,9 @@ The library is available on PyPI, so it can be installed using the following com
pip install scrapegraphai
**Note:** It is higly recommended to install the library in a virtual environment (conda, venv, etc.)
.. important::
It is higly recommended to install the library in a virtual environment (conda, venv, etc.)
If your clone the repository, you can install the library using `poetry <https://python-poetry.org/docs/>`_:

View File

@ -24,6 +24,7 @@
scrapers/graphs
scrapers/llm
scrapers/graph_config
scrapers/benchmarks
.. toctree::
:maxdepth: 2

View File

@ -24,12 +24,14 @@ This flexibility ensures that scrapers remain functional even when website layou
We support many Large Language Models (LLMs) including GPT, Gemini, Groq, Azure, Hugging Face etc.
as well as local models which can run on your machine using Ollama.
Diagram
=======
Library Diagram
===============
With ScrapegraphAI you first construct a pipeline of steps you want to execute by combining nodes into a graph.
Executing the graph takes care of all the steps that are often part of scraping: fetching, parsing etc...
Finally the scraped and processed data gets fed to an LLM which generates a response.
.. image:: ../../assets/project_overview_diagram.png
:align: center
:width: 70%
:alt: ScrapegraphAI Overview

View File

@ -0,0 +1,23 @@
Benchmarks
==========
SearchGraph
^^^^^^^^^^^
`SearchGraph` instantiates multiple `SmartScraperGraph` object for each URL and extract the data from the HTML.
A concurrent approach is used to speed up the process and the following table shows the time required for a scraping task with different **batch sizes**.
Only two results are taken into account.
.. list-table:: SearchGraph
:header-rows: 1
* - Batch Size
- Total Time (s)
* - 1
- 31.1
* - 2
- 33.52
* - 4
- 28.47
* - 16
- 21.80

View File

@ -1,3 +1,5 @@
.. _Configuration:
Additional Parameters
=====================

View File

@ -9,7 +9,9 @@ There are currently three types of graphs available in the library:
- **SearchGraph**: multi-page scraper that only requires a user-defined prompt to extract information from a search engine using LLM. It is built on top of SmartScraperGraph.
- **SpeechGraph**: text-to-speech pipeline that generates an answer as well as a requested audio file. It is built on top of SmartScraperGraph and requires a user-defined prompt and a URL (or local file).
**Note:** they all use a graph configuration to set up LLM models and other parameters. To find out more about the configurations, check the `LLM`_ and `Configuration`_ sections.
.. note::
They all use a graph configuration to set up LLM models and other parameters. To find out more about the configurations, check the :ref:`LLM` and :ref:`Configuration` sections.
SmartScraperGraph
^^^^^^^^^^^^^^^^^

View File

@ -1,3 +1,5 @@
.. _llm:
LLM
===
@ -7,7 +9,9 @@ These models are specified inside the graph configuration dictionary and can be
- **Local Models**: These models are hosted on the local machine and can be used without any API key.
- **API-based Models**: These models are hosted on the cloud and require an API key to access them (eg. OpenAI, Groq, etc).
**Note**: If the emebedding model is not specified, the library will use the default one for that LLM, if available.
.. note::
If the emebedding model is not specified, the library will use the default one for that LLM, if available.
Local Models
------------

View File

@ -1,3 +1,3 @@
sphinx==7.1.2
sphinx-rtd-theme==2.0.0
sphinx-wagtail-theme-6.3.0
pytest==8.0.0