mirror of
https://github.com/VinciGit00/Scrapegraph-ai.git
synced 2026-06-25 21:11:11 +08:00
docs(concurrent): refactor theme and added benchmarck searchgraph
This commit is contained in:
parent
c0d26d61d7
commit
ced2bbcdc9
Binary file not shown.
|
Before Width: | Height: | Size: 50 KiB After Width: | Height: | Size: 53 KiB |
Binary file not shown.
|
Before Width: | Height: | Size: 58 KiB After Width: | Height: | Size: 60 KiB |
Binary file not shown.
|
Before Width: | Height: | Size: 46 KiB After Width: | Height: | Size: 48 KiB |
@ -14,14 +14,16 @@ import sys
|
|||||||
# import all the modules
|
# import all the modules
|
||||||
sys.path.insert(0, os.path.abspath('../../'))
|
sys.path.insert(0, os.path.abspath('../../'))
|
||||||
|
|
||||||
project = 'scrapegraphai'
|
project = 'ScrapeGraphAI'
|
||||||
copyright = '2024, Marco Vinciguerra'
|
copyright = '2024, ScrapeGraphAI'
|
||||||
author = 'Marco Vinciguerra'
|
author = 'Marco Vinciguerra, Marco Perini, Lorenzo Padoan'
|
||||||
|
|
||||||
|
html_last_updated_fmt = "%b %d, %Y"
|
||||||
|
|
||||||
# -- General configuration ---------------------------------------------------
|
# -- General configuration ---------------------------------------------------
|
||||||
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
|
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
|
||||||
|
|
||||||
extensions = ['sphinx.ext.autodoc', 'sphinx.ext.napoleon']
|
extensions = ['sphinx.ext.autodoc', 'sphinx.ext.napoleon','sphinx_wagtail_theme']
|
||||||
|
|
||||||
templates_path = ['_templates']
|
templates_path = ['_templates']
|
||||||
exclude_patterns = []
|
exclude_patterns = []
|
||||||
@ -29,4 +31,19 @@ exclude_patterns = []
|
|||||||
# -- Options for HTML output -------------------------------------------------
|
# -- Options for HTML output -------------------------------------------------
|
||||||
# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output
|
# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output
|
||||||
|
|
||||||
html_theme = 'sphinx_rtd_theme'
|
# html_theme = 'sphinx_rtd_theme'
|
||||||
|
html_theme = 'sphinx_wagtail_theme'
|
||||||
|
|
||||||
|
html_theme_options = dict(
|
||||||
|
project_name = "ScrapeGraphAI",
|
||||||
|
logo = "scrapegraphai_logo.png",
|
||||||
|
logo_alt = "ScrapeGraphAI",
|
||||||
|
logo_height = 59,
|
||||||
|
logo_url = "https://scrapegraph-ai.readthedocs.io/en/latest/",
|
||||||
|
logo_width = 45,
|
||||||
|
github_url = "https://github.com/VinciGit00/Scrapegraph-ai/tree/main/docs/source/",
|
||||||
|
footer_links = ",".join(
|
||||||
|
["Landing Page|https://scrapegraphai.com/",
|
||||||
|
"Docusaurus|https://scrapegraph-doc.onrender.com/docs/intro"]
|
||||||
|
),
|
||||||
|
)
|
||||||
@ -21,7 +21,9 @@ The library is available on PyPI, so it can be installed using the following com
|
|||||||
|
|
||||||
pip install scrapegraphai
|
pip install scrapegraphai
|
||||||
|
|
||||||
**Note:** It is higly recommended to install the library in a virtual environment (conda, venv, etc.)
|
.. important::
|
||||||
|
|
||||||
|
It is higly recommended to install the library in a virtual environment (conda, venv, etc.)
|
||||||
|
|
||||||
If your clone the repository, you can install the library using `poetry <https://python-poetry.org/docs/>`_:
|
If your clone the repository, you can install the library using `poetry <https://python-poetry.org/docs/>`_:
|
||||||
|
|
||||||
|
|||||||
@ -24,6 +24,7 @@
|
|||||||
scrapers/graphs
|
scrapers/graphs
|
||||||
scrapers/llm
|
scrapers/llm
|
||||||
scrapers/graph_config
|
scrapers/graph_config
|
||||||
|
scrapers/benchmarks
|
||||||
|
|
||||||
.. toctree::
|
.. toctree::
|
||||||
:maxdepth: 2
|
:maxdepth: 2
|
||||||
|
|||||||
@ -24,12 +24,14 @@ This flexibility ensures that scrapers remain functional even when website layou
|
|||||||
We support many Large Language Models (LLMs) including GPT, Gemini, Groq, Azure, Hugging Face etc.
|
We support many Large Language Models (LLMs) including GPT, Gemini, Groq, Azure, Hugging Face etc.
|
||||||
as well as local models which can run on your machine using Ollama.
|
as well as local models which can run on your machine using Ollama.
|
||||||
|
|
||||||
Diagram
|
Library Diagram
|
||||||
=======
|
===============
|
||||||
|
|
||||||
With ScrapegraphAI you first construct a pipeline of steps you want to execute by combining nodes into a graph.
|
With ScrapegraphAI you first construct a pipeline of steps you want to execute by combining nodes into a graph.
|
||||||
Executing the graph takes care of all the steps that are often part of scraping: fetching, parsing etc...
|
Executing the graph takes care of all the steps that are often part of scraping: fetching, parsing etc...
|
||||||
Finally the scraped and processed data gets fed to an LLM which generates a response.
|
Finally the scraped and processed data gets fed to an LLM which generates a response.
|
||||||
|
|
||||||
.. image:: ../../assets/project_overview_diagram.png
|
.. image:: ../../assets/project_overview_diagram.png
|
||||||
:align: center
|
:align: center
|
||||||
|
:width: 70%
|
||||||
:alt: ScrapegraphAI Overview
|
:alt: ScrapegraphAI Overview
|
||||||
|
|||||||
23
docs/source/scrapers/benchmarks.rst
Normal file
23
docs/source/scrapers/benchmarks.rst
Normal file
@ -0,0 +1,23 @@
|
|||||||
|
Benchmarks
|
||||||
|
==========
|
||||||
|
|
||||||
|
SearchGraph
|
||||||
|
^^^^^^^^^^^
|
||||||
|
|
||||||
|
`SearchGraph` instantiates multiple `SmartScraperGraph` object for each URL and extract the data from the HTML.
|
||||||
|
A concurrent approach is used to speed up the process and the following table shows the time required for a scraping task with different **batch sizes**.
|
||||||
|
Only two results are taken into account.
|
||||||
|
|
||||||
|
.. list-table:: SearchGraph
|
||||||
|
:header-rows: 1
|
||||||
|
|
||||||
|
* - Batch Size
|
||||||
|
- Total Time (s)
|
||||||
|
* - 1
|
||||||
|
- 31.1
|
||||||
|
* - 2
|
||||||
|
- 33.52
|
||||||
|
* - 4
|
||||||
|
- 28.47
|
||||||
|
* - 16
|
||||||
|
- 21.80
|
||||||
@ -1,3 +1,5 @@
|
|||||||
|
.. _Configuration:
|
||||||
|
|
||||||
Additional Parameters
|
Additional Parameters
|
||||||
=====================
|
=====================
|
||||||
|
|
||||||
|
|||||||
@ -9,7 +9,9 @@ There are currently three types of graphs available in the library:
|
|||||||
- **SearchGraph**: multi-page scraper that only requires a user-defined prompt to extract information from a search engine using LLM. It is built on top of SmartScraperGraph.
|
- **SearchGraph**: multi-page scraper that only requires a user-defined prompt to extract information from a search engine using LLM. It is built on top of SmartScraperGraph.
|
||||||
- **SpeechGraph**: text-to-speech pipeline that generates an answer as well as a requested audio file. It is built on top of SmartScraperGraph and requires a user-defined prompt and a URL (or local file).
|
- **SpeechGraph**: text-to-speech pipeline that generates an answer as well as a requested audio file. It is built on top of SmartScraperGraph and requires a user-defined prompt and a URL (or local file).
|
||||||
|
|
||||||
**Note:** they all use a graph configuration to set up LLM models and other parameters. To find out more about the configurations, check the `LLM`_ and `Configuration`_ sections.
|
.. note::
|
||||||
|
|
||||||
|
They all use a graph configuration to set up LLM models and other parameters. To find out more about the configurations, check the :ref:`LLM` and :ref:`Configuration` sections.
|
||||||
|
|
||||||
SmartScraperGraph
|
SmartScraperGraph
|
||||||
^^^^^^^^^^^^^^^^^
|
^^^^^^^^^^^^^^^^^
|
||||||
|
|||||||
@ -1,3 +1,5 @@
|
|||||||
|
.. _llm:
|
||||||
|
|
||||||
LLM
|
LLM
|
||||||
===
|
===
|
||||||
|
|
||||||
@ -7,7 +9,9 @@ These models are specified inside the graph configuration dictionary and can be
|
|||||||
- **Local Models**: These models are hosted on the local machine and can be used without any API key.
|
- **Local Models**: These models are hosted on the local machine and can be used without any API key.
|
||||||
- **API-based Models**: These models are hosted on the cloud and require an API key to access them (eg. OpenAI, Groq, etc).
|
- **API-based Models**: These models are hosted on the cloud and require an API key to access them (eg. OpenAI, Groq, etc).
|
||||||
|
|
||||||
**Note**: If the emebedding model is not specified, the library will use the default one for that LLM, if available.
|
.. note::
|
||||||
|
|
||||||
|
If the emebedding model is not specified, the library will use the default one for that LLM, if available.
|
||||||
|
|
||||||
Local Models
|
Local Models
|
||||||
------------
|
------------
|
||||||
|
|||||||
@ -1,3 +1,3 @@
|
|||||||
sphinx==7.1.2
|
sphinx==7.1.2
|
||||||
sphinx-rtd-theme==2.0.0
|
sphinx-wagtail-theme-6.3.0
|
||||||
pytest==8.0.0
|
pytest==8.0.0
|
||||||
|
|||||||
Loading…
Reference in New Issue
Block a user