mirror of
https://github.com/VinciGit00/Scrapegraph-ai.git
synced 2026-06-23 21:00:30 +08:00
docs(graph): added new graphs and schema
This commit is contained in:
parent
5684578fab
commit
d27cad5911
@ -3,21 +3,29 @@ Graphs
|
||||
|
||||
Graphs are scraping pipelines aimed at solving specific tasks. They are composed by nodes which can be configured individually to address different aspects of the task (fetching data, extracting information, etc.).
|
||||
|
||||
There are three types of graphs available in the library:
|
||||
There are several types of graphs available in the library, each with its own purpose and functionality. The most common ones are:
|
||||
|
||||
- **SmartScraperGraph**: one-page scraper that requires a user-defined prompt and a URL (or local file) to extract information from using LLM.
|
||||
- **SmartScraperGraph**: one-page scraper that requires a user-defined prompt and a URL (or local file) to extract information using LLM.
|
||||
- **SmartScraperMultiGraph**: multi-page scraper that requires a user-defined prompt and a list of URLs (or local files) to extract information using LLM. It is built on top of SmartScraperGraph.
|
||||
- **SearchGraph**: multi-page scraper that only requires a user-defined prompt to extract information from a search engine using LLM. It is built on top of SmartScraperGraph.
|
||||
- **SpeechGraph**: text-to-speech pipeline that generates an answer as well as a requested audio file. It is built on top of SmartScraperGraph and requires a user-defined prompt and a URL (or local file).
|
||||
- **ScriptCreatorGraph**: script generator that creates a Python script to scrape a website using the specified library (e.g. BeautifulSoup). It requires a user-defined prompt and a URL (or local file).
|
||||
|
||||
With the introduction of `GPT-4o`, two new powerful graphs have been created:
|
||||
|
||||
- **OmniScraperGraph**: similar to `SmartScraperGraph`, but with the ability to scrape images and describe them.
|
||||
- **OmniSearchGraph**: similar to `SearchGraph`, but with the ability to scrape images and describe them.
|
||||
|
||||
|
||||
.. note::
|
||||
|
||||
They all use a graph configuration to set up LLM models and other parameters. To find out more about the configurations, check the :ref:`LLM` and :ref:`Configuration` sections.
|
||||
|
||||
|
||||
.. note::
|
||||
|
||||
We can pass an optional `schema` parameter to the graph constructor to specify the output schema. If not provided or set to `None`, the schema will be generated by the LLM itself.
|
||||
|
||||
OmniScraperGraph
|
||||
^^^^^^^^^^^^^^^^
|
||||
|
||||
@ -41,7 +49,8 @@ It will fetch the data from the source and extract the information based on the
|
||||
omni_scraper_graph = OmniScraperGraph(
|
||||
prompt="List me all the projects with their titles and image links and descriptions.",
|
||||
source="https://perinim.github.io/projects",
|
||||
config=graph_config
|
||||
config=graph_config,
|
||||
schema=schema
|
||||
)
|
||||
|
||||
result = omni_scraper_graph.run()
|
||||
@ -70,15 +79,16 @@ It will create a search query, fetch the first n results from the search engine,
|
||||
# Create the OmniSearchGraph instance
|
||||
omni_search_graph = OmniSearchGraph(
|
||||
prompt="List me all Chioggia's famous dishes and describe their pictures.",
|
||||
config=graph_config
|
||||
config=graph_config,
|
||||
schema=schema
|
||||
)
|
||||
|
||||
# Run the graph
|
||||
result = omni_search_graph.run()
|
||||
print(result)
|
||||
|
||||
SmartScraperGraph
|
||||
^^^^^^^^^^^^^^^^^
|
||||
SmartScraperGraph & SmartScraperMultiGraph
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
.. image:: ../../assets/smartscrapergraph.png
|
||||
:align: center
|
||||
@ -100,12 +110,14 @@ It will fetch the data from the source and extract the information based on the
|
||||
smart_scraper_graph = SmartScraperGraph(
|
||||
prompt="List me all the projects with their descriptions",
|
||||
source="https://perinim.github.io/projects",
|
||||
config=graph_config
|
||||
config=graph_config,
|
||||
schema=schema
|
||||
)
|
||||
|
||||
result = smart_scraper_graph.run()
|
||||
print(result)
|
||||
|
||||
**SmartScraperMultiGraph** is similar to SmartScraperGraph, but it can handle multiple sources. We define the graph configuration, create an instance of the SmartScraperMultiGraph class, and run the graph.
|
||||
|
||||
SearchGraph
|
||||
^^^^^^^^^^^
|
||||
@ -132,7 +144,8 @@ It will create a search query, fetch the first n results from the search engine,
|
||||
# Create the SearchGraph instance
|
||||
search_graph = SearchGraph(
|
||||
prompt="List me all the traditional recipes from Chioggia",
|
||||
config=graph_config
|
||||
config=graph_config,
|
||||
schema=schema
|
||||
)
|
||||
|
||||
# Run the graph
|
||||
@ -169,6 +182,7 @@ It will fetch the data from the source, extract the information based on the pro
|
||||
prompt="Make a detailed audio summary of the projects.",
|
||||
source="https://perinim.github.io/projects/",
|
||||
config=graph_config,
|
||||
schema=schema
|
||||
)
|
||||
|
||||
result = speech_graph.run()
|
||||
|
||||
@ -12,7 +12,6 @@ authors = [
|
||||
{ name = "Lorenzo Padoan", email = "lorenzo.padoan977@gmail.com" }
|
||||
]
|
||||
dependencies = [
|
||||
# python = ">=3.9, <3.12"
|
||||
"langchain==0.1.15",
|
||||
"langchain-openai==0.1.6",
|
||||
"langchain-google-genai==1.0.3",
|
||||
@ -32,8 +31,6 @@ dependencies = [
|
||||
"playwright==1.43.0",
|
||||
"google==3.0.0",
|
||||
"yahoo-search-py==0.3",
|
||||
"networkx==3.3",
|
||||
"pyvis==0.3.2",
|
||||
"undetected-playwright==0.3.0",
|
||||
]
|
||||
|
||||
|
||||
Loading…
Reference in New Issue
Block a user