mirror of
https://github.com/VinciGit00/Scrapegraph-ai.git
synced 2026-06-04 21:01:04 +08:00
Add SmartScraperMultiBatchGraph that uses the OpenAI Batch API for LLM
calls, providing ~50% cost savings when real-time results aren't needed.
Key features:
- SmartScraperMultiBatchGraph: 3-phase pipeline (fetch/parse → batch
submit → merge) that separates HTML fetching from LLM generation
- BatchGenerateAnswerNode: collects prompts from all URLs and submits
them as a single OpenAI Batch API request
- utils/batch_api.py: helpers for creating, polling, and retrieving
batch results with doc_id → URL mapping
- Per-document error handling: partial failures don't break the batch
- Configurable polling interval and max wait time
- OpenAI-only validation (rejects non-OpenAI providers gracefully)
- Results sorted by custom_id for consistent ordering
- 18 unit tests with 100% pass rate
Usage:
graph = SmartScraperMultiBatchGraph(
prompt='Extract key points',
source=['https://url1.com', 'https://url2.com'],
config={'llm': {'model': 'openai/gpt-4o-mini'}}
)
result = graph.run()
Closes #1036
|
||
|---|---|---|
| .. | ||
| builders | ||
| docloaders | ||
| graphs | ||
| helpers | ||
| integrations | ||
| models | ||
| nodes | ||
| prompts | ||
| telemetry | ||
| utils | ||
| __init__.py | ||