Scrapegraph-ai/scrapegraphai
MrAliHasan 9d4eba1f15 feat: add OpenAI Batch API support for SmartScraperMultiGraph (#1036)
Add SmartScraperMultiBatchGraph that uses the OpenAI Batch API for LLM
calls, providing ~50% cost savings when real-time results aren't needed.

Key features:
- SmartScraperMultiBatchGraph: 3-phase pipeline (fetch/parse → batch
  submit → merge) that separates HTML fetching from LLM generation
- BatchGenerateAnswerNode: collects prompts from all URLs and submits
  them as a single OpenAI Batch API request
- utils/batch_api.py: helpers for creating, polling, and retrieving
  batch results with doc_id → URL mapping
- Per-document error handling: partial failures don't break the batch
- Configurable polling interval and max wait time
- OpenAI-only validation (rejects non-OpenAI providers gracefully)
- Results sorted by custom_id for consistent ordering
- 18 unit tests with 100% pass rate

Usage:
  graph = SmartScraperMultiBatchGraph(
      prompt='Extract key points',
      source=['https://url1.com', 'https://url2.com'],
      config={'llm': {'model': 'openai/gpt-4o-mini'}}
  )
  result = graph.run()

Closes #1036
2026-02-21 03:17:15 +05:00
..
builders codebeaver/pre/beta-963 - . 2025-04-14 07:50:46 +00:00
docloaders codebeaver/pre/beta-963 - . 2025-04-14 07:50:46 +00:00
graphs feat: add OpenAI Batch API support for SmartScraperMultiGraph (#1036) 2026-02-21 03:17:15 +05:00
helpers Merge branch 'main' into pre/beta 2025-06-06 12:46:34 +02:00
integrations feat: ⛏️ enhanced contribution and precommit added 2025-01-06 15:10:35 +01:00
models fix: grok integration and add new grok models 2025-05-31 00:13:44 +07:00
nodes feat: add OpenAI Batch API support for SmartScraperMultiGraph (#1036) 2026-02-21 03:17:15 +05:00
prompts Merge pull request #993 from ScrapeGraphAI/main 2025-06-24 17:29:48 +02:00
telemetry feat: ⛏️ enhanced contribution and precommit added 2025-01-06 15:10:35 +01:00
utils feat: add OpenAI Batch API support for SmartScraperMultiGraph (#1036) 2026-02-21 03:17:15 +05:00
__init__.py feat: update logs 2025-06-07 16:53:55 +02:00