mirror of
https://github.com/VinciGit00/Scrapegraph-ai.git
synced 2026-06-04 21:01:04 +08:00
Add SmartScraperMultiBatchGraph that uses the OpenAI Batch API for LLM
calls, providing ~50% cost savings when real-time results aren't needed.
Key features:
- SmartScraperMultiBatchGraph: 3-phase pipeline (fetch/parse → batch
submit → merge) that separates HTML fetching from LLM generation
- BatchGenerateAnswerNode: collects prompts from all URLs and submits
them as a single OpenAI Batch API request
- utils/batch_api.py: helpers for creating, polling, and retrieving
batch results with doc_id → URL mapping
- Per-document error handling: partial failures don't break the batch
- Configurable polling interval and max wait time
- OpenAI-only validation (rejects non-OpenAI providers gracefully)
- Results sorted by custom_id for consistent ordering
- 18 unit tests with 100% pass rate
Usage:
graph = SmartScraperMultiBatchGraph(
prompt='Extract key points',
source=['https://url1.com', 'https://url2.com'],
config={'llm': {'model': 'openai/gpt-4o-mini'}}
)
result = graph.run()
Closes #1036
|
||
|---|---|---|
| .. | ||
| __init__.py | ||
| abstract_graph.py | ||
| base_graph.py | ||
| code_generator_graph.py | ||
| csv_scraper_graph.py | ||
| csv_scraper_multi_graph.py | ||
| depth_search_graph.py | ||
| document_scraper_graph.py | ||
| document_scraper_multi_graph.py | ||
| json_scraper_graph.py | ||
| json_scraper_multi_graph.py | ||
| markdownify_graph.py | ||
| omni_scraper_graph.py | ||
| omni_search_graph.py | ||
| screenshot_scraper_graph.py | ||
| script_creator_graph.py | ||
| script_creator_multi_graph.py | ||
| search_graph.py | ||
| search_link_graph.py | ||
| smart_scraper_graph.py | ||
| smart_scraper_lite_graph.py | ||
| smart_scraper_multi_batch_graph.py | ||
| smart_scraper_multi_concat_graph.py | ||
| smart_scraper_multi_graph.py | ||
| smart_scraper_multi_lite_graph.py | ||
| speech_graph.py | ||
| xml_scraper_graph.py | ||
| xml_scraper_multi_graph.py | ||